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Preface 



The scale-space conference series dates back to the NSF /ESPRIT transatlantic 
collaboration on “geometry-driven diffusion” (1993-1996). This collaboration led 
to a series of very successful workshops followed by a PhD summer school on 
Gaussian Scale-Space Theory in Copenhagen, in the spring of 1996. The follow- 
ing year, the First International Conference on Scale-Space Theory in Computer 
Vision (Utrecht, July, 1997) was held. The series of international conferences 
has now grown to three. As was the case for the Second International Confer- 
ence (Corfu, September 1999), the 2001 conference was affiliated with the ICCV 
as one of its several workshops. Entitled the “Workshop on Scale-Space and 
Morphology” , the purpose of the conference was to encourage the exchange of 
information and to foster interactions among researchers in scale-space theory 
in computer vision and mathematical morphology. With the publication of these 
proceedings, we feel that our purposes have been accomplished. 

The conference was held for the first time in North America, and succeeded 
in attracting participants from the western hemisphere and the pacific rim. A 
concerted effort was also made to make the workshop attractive to and affordable 
for graduate students. Of 60 high-quality submissions (including many papers 
in the subject areas that were accepted at ICCV or the overlapping Workshop 
on Variational and Level Set Methods), 18 papers were selected for oral presen- 
tations. They form Part I of this volume. Part 2 of the volume consists of 23 
papers accepted for poster presentations. Invited talks were given by Professor 
Jitendra Malik, of the Computer Vision Group at the University of California 
at Berkeley, and by Professor Amiram Grinvald, of the Weizmann Institute of 
Science in Rehovot, Israel. 

On behalf of the Program Board, I would like to thank the authors for their 
excellent presentations and written work; the referees for their time and valuable 
comments (each paper was reviewed by 3 referees); and the members of the 
General Board for their guidance in putting the conference together. We are 
grateful to the IEEE Computer Society (especially to Keith Price and Tom Fink 
for their assistance), to the ICCV Workshops Chair Jim Clark, and to Jim Little 
and David Lowe, the local organizers for ICCV in Vancouver. On a personal 
note I would like to thank the members of the Program Board, who kept me 
on course and encouraged me toward the finish line in preparing this volume, 
and to thank Libbie Geiger at the University of Richmond for her secretarial 
assistance. Finally, the conference participants deserve recognition for making 
the event both enjoyable and worthwhile. 



May 2001 



Michael Kerckhove 
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Using the Vector Distance Functions to Evolve 
Manifolds of Arbitrary Codimension 



Jose Gomes^ and Olivier Faugeras^ 

^ I.B.M Watson Research Center, Yorktown, New York, U.S.A 
^ I.N.R.I.A Sophia Antipolis, France and MJ.T, Boston, U.S.A 



Abstract. We present a novel method for representing and evolving 
objects of arbitrary dimension. The method, called the Vector Distance 
Function (VDF) method, uses the vector that connects any point in space 
to its closest point on the object. It can deal with smooth manifolds with 
and without boundaries and with shapes of different dimensions. It can 
be used to evolve such objects according to a variety of motions, in- 
cluding mean curvature. If discontinuous velocity fields are allowed the 
dimension of the objects can change. The evolution method that we pro- 
pose guarantees that we stay in the class of VDF’s and therefore that 
the intrinsic properties of the underlying shapes such as their dimension, 
curvatures can be read off easily from the VDF and its spatial deriva- 
tives at each time instant. The main disadvantage of the method is its 
redundancy: the size of the representation is always that of the ambient 
space even though the object we are representing may be of a much lower 
dimension. This disadvantage is also one of its strengths since it buys us 
flexibility. 



1 Introduction and History 

In this paper we present a general method for representing objects of arbitrary di- 
mension embedded in spaces of arbitrary dimension. The representation method 
is also the basis for evolving such objects according to a variety of motions, in- 
cluding mean-curvature. We are not limited to objects of constant dimension, for 
example we can cope with open curves or surfaces, or even with objects such as 
the one shown in figure E which is the union of an open curve, an open surface, 
and a volume. 

The history of curves and surfaces in Computer Vision can be traced back to 
the early work on snakes by Kass, Witkins and Terzopoulos HI- This pioneering 
work was reformulated by Caselles, Kimmel and Sapiro jOI and by Kichenassamy 
et al. in the context of PDE-driven curves and surfaces. There is an extensive 
literature that addresses the theoretical aspects of these PDF’s and offers geo- 
metrical interpretations as well as results of uniqueness and existence mm- 

The level set methods were introduced by Osher and Sethian in m and 
provide both a nice theoretical framework and efficient practical tools for solving 
such PDF’s. In these methods, the time evolution is achieved by means of a time- 
dependant implicit representation of the curve or surface. 

M. Kerckhove (Ed.): Scale-Space 2001, LNCS 2106, pp. 1-|1^ 2001. 

© Springer- Verlag and lEEE/CS 2001 



2 



Jose Gomes and Olivier Faugeras 




\ 



Point; dimension = 0 
codimension = 3 



Fig. 1. 



But the level set methods were developed in the framework of the evolution of 
closed manifolds of codimension one and the case of higher codimension has been 
less investigated. Some recent contributions in this direction are the following. 

In 1996, Ambrosio and Soner, inspired by ideas of De Giorgi, published a 
paper [2j in which they showed that the level-set method could be extended to the 
case of arbitrary codimension. Their idea is to replace the evolution of the smooth 
manifold under mean curvature motion by that of a tubular neighborhood of the 
manifold, in effect a hypersurface. They show that the evolution of this tube is 
related to that of the manifold in a simple way and that it is not the mean 
curvature motion of the hypersurface. A different approach, closer to what we 
propose in the present paper, has also been proposed by Sapiro and collaborators 
in a preliminary paper on tracking curves on a surface 

There are a number of problems with Ambrosio and Soner ’s approach, the 
main one being that it sweeps in some sense the dust under the rug: even though 
it does evolve correctly the manifold of interest, it turns out that recovering this 
manifold is in itself a major problem since it is not explicitly represented. 

It is therefore natural to turn to a different approach and to attempt to 
represent an arbitrary smooth manifold A4 of dimension k as the intersection 
of n — k hypersurfaces; the evolution of the hypersurfaces is computed in order 
to guarantee that their intersection evolves as required for AJ and that they 
remain transverse. This approach is natural since it is based on the definition of 
the dimension (or the codimension) . It was suggested by Ambrosio and Soner in 
P] but not pursued because it was thought to be too difficult. The corresponding 
program can nonetheless be studied, as described in n. We shall not pursue 
it here because it is not easy to deal with such objects as the one represented in 
figure n and to deal with changes in the dimension of the manifold during the 
evolution. 

In 1998, Ruuth, Merriman, Xin and Osher ini, considering the particular 
features of the complex Ginzburg-Landau equation, introduced another method 
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for evolving space curves according to the mean curvature motion. The curve is 
represented by means of a (two-dimensional) complex function of unit magnitude 
defined on whose phase angle “winds” around the curve. The time evolution 
is “diffusion-generated”, i.e. it is the consequence of a diffusion-renormalization 
loop. Very convincing results are shown demonstrating in particular the possibil- 
ity for the curve to have its topology altered during the evolution. Nevertheless, 
this function is not defined at points of the curve of interest and it is a serious 
disadvantage in the context of sampled functions. 

Alternatively, we shall follow a slightly counterintuitive idea that was pro- 
posed by Ruuth, Merriman and Osher in a discrete setting m and whose roots 
are in a work by Steinhoff, Fan and Wang PH. We were inspired by the last sec- 
tion of this technical report and our paper elaborates on some of the suggestions 
of these authors and generalizes them in a variety of directions. The idea is to 
introduce redundancy in the representation of the manifold M: instead of rep- 
resenting it as the intersection of k hypersurfaces, we propose to represent it as 
the intersection of n hypersurfaces. These hypersurfaces are related in a natural 
manner to the distance of the points of M” to A4 and evolve in such a way as to 
guarantee that their intersection evolves according to the desired evolution for 
M. Introducing this redundancy allows for more flexibility in the representation: 
manifolds with non constant dimensions (in space) and boundaries such as the 
one in figure Q can now be represented and evolved. Their dimension can even 
change in time, i.e. increase or decrease. 

The plan of the paper is as follows. In section |3 we introduce our redundant 
representation, called the Vector Distance Function (VDF), for arbitrary smooth 
manifolds of dimension k. In section^we study some of its differential properties. 
In section 0we start looking at the problem of evolving a manifold by evolving 
its VDF instead: we show that this problem has a very simple solution that 
guarantees that the VDF remains a VDF at all times. In section we discuss 
the problem of manifolds with a boundary and we show that it is closely related 
to changes in the dimension. We conclude and show a preliminary result in 
section 0 

2 The Vector Distance Function (VDF) to a Smooth 
Manifold 

Let At be a closed subset of R”. For every point x we note 6(x) the distance 
dist{x, At) of X to At. This function is Lipschitz continuous and therefore almost 
everywhere differentiable P). The same holds for the function rj{x) = i(5^(x). 
We note u(x) its derivative, defined almost everywhere: 

u(x) = Dr](x) = S{x)DS(x). 

This equation shows that, since 5(x) satisfies a.e. the eikonal equation, 



\\D6\\ = 1, 



( 1 ) 
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u(x) is a vector of length <5(x). Moreover, at a point x where S is differentiable, 
let y = Pm (x) be the unique projection of x onto AJ . This point is such that 

S{x) = ||x-y||. 

Besides, if AJ is smooth at y, the vector x — y is normal to Al at y and parallel 
to DS(x): 

u(x) = X - y = X - Pm{x). 

The vectors such as x — y, normal at y to Al, define the characteristics of the 
distance function 6. Starting from a point y of Al and following a characteristic, 
i.e. a direction in the normal space NyM, we either go to infinity or reach a 
point z at finite distance where i5 is not differentiable and therefore u is not 
defined. Such a point belongs to the skeleton of Al. Because of the previous 
properties of the function u, we have the following proposition 

Proposition 1. Letx be a point o/R” where u is defined. The following relation 

u(x + cm(x)) = (1 + o)u(x) (2) 

holds true for all values of +oo > Um > Oi > —1 such that x + Omu(x) is the 
first point on the characteristic where u is not defined. 

Proof. Use the equation u(x) = x — Pm{^) and the fact that P^(x + au(x)) = 
Pm(^) for all Qf’s such that am > a > —1. 

It may be interesting to pause here and make the remark that the VDF to a 
smooth manifold Al is an implicit representation of this manifold: 

Proposition 2. Let M be a smooth closed manifold and u its VDF, defined 
a.e.. Then 

Al=u'i(0). (3) 

In effect, Al is the intersection of the n hypersurfaces of equations ufix) =0,1 = 
1, - • • ,n. Since u represents implicitly Al, the rank of the differential Dm of u 
which is defined a.e. if Al is smooth provides some interesting information about 
the dimension of Al . Indeed, we have 

Lemma 1. The codimension of M = u~^(0) is equal to the rank of the differ- 
ential Du(x) at points of A4 . 

Proof. This is a particular case of the implicit function theorem, cf [ I .'ij for 
details. 

This latter fact is not particular to VDF’s but, as we shall see in the next 
section, the relation between Du and the codimension of Al is even more re- 
markable in the case of VDF’s since the codimension of Al can be determined 
by the value of Du at points off A4 . 

Because we are interested in evolving the manifold Ad through the evolution 
of u, while keeping u a VDF, we are interested in finding a characterization of 
the VDF’s analogous to the one for distance functions, Q). This will be our first 
step in the exploration of the differential properties of the function u that will 
be pursued in the next section. 
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Proposition 3. Let u : K" — > R" be such that 

(Z?u)’^u = u a.e. (4) 

and u is continuous at all points of the set M = u~^(0). Then u(x) = Drjfx.) 
a.e., where rj{x) is the function ^dist^{x, M.) to the set A4. 

Proof. Define </>(x) = ||u(x)|j and compute its first order derivative with respect 
to X 

„ , Du ^ u u 



Hence ||V^|| = 1 a.e., which means that (jj is equal to the distance function to 

the set At plus a constant: 4> = S + C. In addition, the combination of cj) = S + C 
and 4> = |ju|| shows that u = (5 + C)V6. The continuity of u on u“^(0) implies 
that C = 0. Indeed, let Xq be a point of At and n be a unit vector of tVx(,At, the 
normal space of At at Xq . We consider the line A : Xq + An and the variations of 
i5 and WS along this line. The product SWS is continuous on At but V(5 is not. 
Finally u = (pWcj) = V ■ 

Equation m is the characteristic equation of the class of Vector Distance Func- 
tions. 

The previous proposition says nothing about the regularity of the set At. 
In fact, the proof of propositions assumes that At is smooth enough to have 
a normal space at every point. This is explained later in the paper. If we want 
that set to be a smooth manifold, then it is likely that u must satisfy some 
extra regularity conditions. We have not pursued this direction but a clue can 
be found in the paper Q where it is shown that, given a smooth manifold AI, the 
first and second fundamental forms of M can be recovered from the third order 
derivatives of the distance function 5 to A1 at all points where it is differentiable. 

To provide the reader with some intuition, we show in figure |21the VDF of a 
smooth manifold of dimension 1, a circle, embedded in 

3 Properties of VDF’s 

We now study some differential properties of the VDF u. Most of them can be 
found in |2| and the others are proved in Hg. 

Equation P yields, for a = —1 the (almost obvious) equation 

u(x — u(x)) 0. (5) 

We use this equation to show the following propositions. 

Proposition 4. The derivative of a VDF satisfies the following relation at each 
point of Ai: 



Du= {Duf. 



( 6 ) 
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Fig. 2. The left figure shows that the VDF of a circle is a radial field equal 
to 0 on the circle and undefined at the center. The right figure proposes an 
alternate visualization of VDF’s: since VDF’s are vector fields, it might be useful 
to visualize separately the direction and the magnitude fields. In this figure, 
the direction field of the VDF of a circle is represented by unit vectors and 
its magnitude is visualized by some of its level-sets (dashed lines). Any vector 
function can be represented in that way but in the case of VDF’s, the dashed 
lines are parallel to the represented curve and the unit vectors are normal to the 
dashed lines. 



Therefore it is a projector on a vector subspace and we have the following propo- 
sition. 



Proposition 5. When evaluated at yi on A4, ZJu(x) is the projector on the 
normal space N^M.. This implies that ZJu(x) has n — k (the dimension of N^M ) 
eigenvalues equal to 1 and k (the dimension ofT^M) eigenvalues equal to 0. In 
particular, the rank of Du is equal to n — k on M.. 

We now come to the study of the eigenvalues and eigendirections of Du 
outside AJ. The results are quite simple if we consider the line defined by the 
two points X and Pm (x) : let n be the unit vector parallel to x — Pm (x) and 
consider the line s — > Pm{'^) -I- sn = x(s); such a line is called a characteristic 
line. Consider further the values Smin < 0 and Smax > 0 (possibly infinite) such 
that Du(x(s)) is defined on the open interval I =]smm, Smax[, the eigendirections 
of Du(x(s)) are constant and n — k eigenvalues are equal to 1 for all s in I. Such 
an open segment is called a characteristic segment. More precisely, we have the 
following proposition 

Proposition 6. The eigendirections of the symmetric matrix Du are constant 
along each characteristic segment. 

Moreover, if a ray is parameterized by its arc-length s, starting at Pm(^), then 
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each one of the eigenvalues of Du has one of the three following forms: 

A(s) = 1, or 
A(s) = 0, or 

A(s) = ^, c>0, (7) 

s ±c 

Vs G /, 

where c depends on the particular eigendirection. The first form corresponds to 
the eigenvectors of Du which are elements of the normal space to M at 
there are n — k such eigenvalues. The second and third forms (the second form 
is obtained from the third form by taking c = oo) correspond to eigenvectors of 
Du which are elements of the tangent space to A4 at Pm (x) . There are k such 
eigenvalues. 

There is also a remarkable relation between the second order spacial deriva- 
tive of u and the mean curvature of M. . 

Proposition 7. The mean- curvature vector ofM, of dimension k, atx is equal, 
up to a scale factor, to the Laplacian Z\u(x) of u at the same point: 

?t(x) = — i(z4u(x)), Vx G At. 
k 

4 How to Evolve a Smooth Manifold by Evolving Its 
VDF 

Let us consider a family At(p,t) of smooth manifolds of dimension k, where p 
is a fc-dimensional vector parameterizing At at each time instant t. We assume 
the initial conditions 

At(-,0)=Ato(-), 

where At o is a smooth manifold of dimension k. Furthermore the evolution of 
the family At is governed by the following PDF 

Mt{p,t) = n{M{p,t),t) + n^^^^,^{'D{M{p,t),t))=^ V{M{p,t),t), (8) 

where Ti{Ai{p,t),t) is the mean curvature vector at the point At(p,t) of the 
manifold At and T>{x,t) is a vector field defined on K" x R+ representing a 
velocity induced on At by some data. nM(p t) projection operator on the 

normal space N M.M{p.t) to At at the point At(p,t)- The goal of this section is 
to explore a way of evolving u, the VDF to At, instead of At while guaranteeing 
three conditions: 
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i) That u remains a VDF at all time instants. 
n) That u~^(0) = AJ at all times where AJ is defined. 
in) That the manifold Ad evolves according to (0. 

Since the value of the field b = Uj is needed in order to evolve u, we shall 
develop now the idea that b is itself the solution to a certain PDE which can be 
solved numerically. 

It is not too difficult, from 0), to find a characterization of b. 

Proposition 8. The velocity field b of the VDF u is characterized by the first 
order, quasi-linear Partial Differential Equation 

Dhu = (I - ZJu)b, (9) 

with initial conditions 

h{M{p,t),t) = -V{M{p,t),t), (10) 

wherever u is defined and differentiable. 

Proof. Take the time derivative of equation 0) . 

We propose to solve numerically (0 with the initial condition II I i )ll : this 
provides us with the value of the field Uj needed to evolve {i.e. update) u and 
is therefore a practical way of evolving the VDF. 

A geometrical interpretation of this is the following. 

Proposition 9. At all points x where b is defined, its component in the normal 
space to Ad at the point Pm{^) is equal to minus the normal velocity 

V {Pm (x) , t) of the point Pm (x) .' 

= b(x- u(x,t))|ATp^(^,^ = b(P^(x))|AT^^(^,^ 

= -V(P^(x),t). (11) 

Proof. We use the method of characteristics (see e.g. 0) and rewrite 0 along 
the characteristics of b so that a set of ODE’s are obtained (c/ m for a detailed 
and instructive proof). 

As far as the tangential components are concerned, their variation along the 
characteristic lines can be related to the coefficients of proposition (c/ Q2I)- 

5 Smooth Manifolds with a Boundary and Changes in 
Dimension 

A simple example will introduce the new issues of this section. In the plane 
K^, we consider the time dependent segment [A{t), B{t)] whose endpoints have 
respectively the velocities VA{t) = ai and vpit) = (3i with a < 0 < /3 and with 
initial condition A(0) = B{0) = O. This situation describes a point transforming 
into a segment with increasing length: it is a prototype of a change of dimension 
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followed by the evolution of a smooth manifold with boundary. We note x = 
[x, yY' the coordinates of a point of The VDF to this object is easily shown 
to satisfy 

{ va if a; < at, 

0 if at < a; < (12) 

vb \i X > pt. 

This equation is a variant of a well-known class of PDE’s called the transport 
equations j0| • One of its solution is readily shown to be 

{ Uq(x — A) if X < at, 

uo(a;,0) if at < x < pt, 

Uq(x — B) if X > pt. 

At first glance (see figure El)) the vector field b = —DuY is not of the form 
presented in the previous section because it is only piecewise-smooth, b being 
discontinuous on the lines of equations x = at and x = pt. But, as shown in the 
figure, the evolution of the point O is quite remarkable: it is a smooth manifold of 
dimension 0 which is turned into a smooth manifold with boundary (the segment 
AB). The velocity field b is discontinuous on the vertical axis at time t = 0 
which has the effect of allowing the point to “spread” to a line segment. At time 
t > 0 the velocity field b(., t) is of the form 'V{Pab{-)) everywhere except on the 
previous two lines. We note that the normal component of V is continuous across 
these two lines, being equal to 0 everywhere, while its tangential component is 
discontinuous over them. This last point is the reason why the segment [AB] 
can grow in time. 

The situation we have just described is archetypal of all cases in higher 
dimensions and codimensions and reveals an undesirable lack of generality in 
the analysis of the previous section and suggests that non- continuous b’s may 
also be interesting since they can account for changes of dimension and tangent 
velocities at the boundary of a manifold (both are intimately related as it can 
be learned from the segment example). 

In order to deal with these new issues, it is necessary in the first place to 
provide ourselves with a model for the intuitive but vague notion of “changing 
dimension” . We shall say that an initial smooth manifold W of dimension m < n 
at t = 0 increases its dimension if it “spreads” out in the direction of some 
privileged normal directions and becomes another manifold A4(t), t > 0, of 
higher dimension k € {m-\- 1, • • • , n} with a smooth boundary dM of dimension 
k—1. The increment in dimension is equal to the number of linearly independent 
orthogonal normal directions where this filling occurs. The modeling of a decrease 
in dimension is obtained by reversing the direction of time. This spreading follows 
the path of some geodesic curves of the ambient space that are thrown in all of the 
chosen normal directionsQ, starting from W. For instance, in the example above. 



^ See I19I?SI for important results concerning the exponential map and its use in the 
study of neighborhoods. 
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the initial manifold W is the point O, the target manifold A4 is the segment 
[AB], the geodesic paths followed to spread the point O are the segments [OA\ 
and [OB] (of course, depending on the chosen metric, the geodesic paths may 
not be straight lines so that O could have been spread to a “curved” curve as 
well) and the boundary dAi of the target manifold is made of the two end points 
A and B. These simple notions generalize naturally in any dimension. 

In this model, there is a special manifold, noted RdMi which plays a sin- 
gular role. Indeed, consider the ruled 0 hypersurface whose generatrix is dM. 
and whose rulings directions are all the normal directions of AJ. In the above 
example, Rom is made of the two lines of equations x = at and x = f3t since 
they are “ruled” hypersurfaces starting at A and B (which both form dA4) and 
directed to the normal to [AB] (which plays the role oi Ai). Then we have the 
two following propositions (c/ 1 1 for proofs) generalizing the observations made 
on the example above. 

Proposition 10. The spatial derivative D\i, of the VDF to the smooth manifold 
Ai with boundary dAi is discontinuous on the special ruled hypersurface Rom 
defined above, generated by dAi and the normal space to Ai at points of dAi. 



Proposition 11. The tangential component of the time derivative nj, of the 
VDF to the smooth manifold Ai with boundary dAi can be discontinuous on the 
hypersurface Rqm generated by dA4 and the normal space to Ai at points of 
dAi. The normal component is continuous. 

We have characterized the singularities of the VDF to a manifold which has 
a boundary. It is also possible to go a little further and see what singularities 
exist at the very moment of the change of dimension [i.e. at t = 0 in our 
simple example). It suffices to study limt_,o and it is done in [1 ,1] . These 
two propositions will hopefully be used to design appropriate numerical schemes 
that would deal with these singularities. 

6 Some Remarks and Conclusion 

The method of the VDF’s for representing arbitrary smooth manifolds with or 
without a boundary finds its roots in the work of Ambrosio and Soner [ 2 | where 
we found inspiration and some of the technical results that we needed, and in 
the work of Ruuth, Merriman, Xin and Osher UHl that inspired us the idea of 
the VDF representation. Our contributions are the development of a) a method 
for evolving a VDF instead of a smooth closed manifold while guaranteeing 
that it stays a VDF over time and that the manifold evolves correctly, b) a 
theory that describes changes of dimension by a generalized transport equation 
and c) a theory that extends a) to deal seamlessly with smooth manifolds with 
boundaries. Moreover, in our approach, the function we evolve is regular on the 
manifold of interest unlike for example the method presented in mi. 
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These theoretical developments are being implemented. Let us make two 
remarks concerning this implementation. The first remark is related to the com- 
putation of the velocity field b. Even though propositional provides a character- 
ization of b(x) in terms of V(x — u(x)), our method is to compute b by applying 
proposition and solving the quasi-linear PDE Q with initial conditions llOH . 

The second remark is that because of the accumulation of numerical errors, 
the function u may drift away from the class of VDF’s. In order to correct for 
this drift, we suggest that it may be a good idea to combine the solution of 
U( = b with that of 

ut = ((Du)^ -h (a - l)I)u, (13) 



where 



a = Trace{Du{Du)"^) + u • Z\u — divu. 



Equation dUll is the Euler-Lagrange equation of the following functional 



^ J^\\{Dufu-uf dx, 

where is some neighborhood of M . This functional arises naturally from propo- 
sition 13 

We have implemented this VDF reprojection and the numerical results are 
very good (c/ figure 0 ). 



Fig. 3. The VDF to the vertical line of equation a; = 0, i.e. the left border 
of the grid, is exagerately perturbated (on the left image) and the obtained 
function is used as the initial condition for equation (HI which is solved using 
a regular forward differences scheme. After a few iterations (in the center and 
right images), the function is retransformed into the correct VDF to the line of 
equation a; = 0. See figure Elto understand the visualization. 




To conclude, we think that the VDF method for representing and evolving 
shapes has the following advantages: it can deal with smooth manifolds with and 
without boundaries, with shapes of different dimensions; if discontinuous velocity 
fields are allowed dimension can change; the evolution method that we propose 
guarantees that we stay in the class of VDF’s and therefore that the intrinsic 
properties of the underlying shapes such as their dimension, curvatures can be 



12 



Jose Gomes and Olivier Faugeras 



read off easily from the VDF and its spatial derivatives. The main disadvantage 
is its redundancy: the size of the representation is always that of the ambient 
space even though the object we are representing is of a much lower dimension. 
This disadvantage is also one of its strengths since it buys us ffexibility. 




Fig. 4. Changing dimension: If we allow the velocity field b to be discontinuous, 
we can induce changes in the dimension of the manifold. The example shows the 
simplest of all manifolds, a single point O in the plane and its VDF (upper left- 
hand corner), the velocity field b(., 0) = —DuV is discontinuous on the vertical 
axis (upper right-hand corner). At a later time t > 0 the point O has become 
the line segment [AB] (its VDF is shown in the lower left-hand corner); the 
new velocity V(.,f) is shown in the lower right-hand corner. The initial point, 
a smooth manifold without boundary of dimension 0 is turned into a closed (in 
the topological sense) open (in the usual sense) curve, a smooth manifold with 
boundary (the two endpoints A and B) of dimension 1. 
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Abstract In this paper we will present a least committed multi-scale 
method for computation of optic flow fields. We extract optic flow fields 
from normal flow, by fitting the normal components of a local polynomial 
model of the optic flow to the normal flow. This htting is based on an 
analytically solvable optimization problem, in which an integration scale- 
space over the normal flow field regularizes the solution. An automatic 
local scale selection mechanism is used in order to adapt to the local 
structure of the flow held. The performance prohle of the method is 
compared with that of existing optic how techniques and we show that 
the proposed method performs at least as well as the leading algorithms 
on the benchmark image sequences proposed by Barron et al. 0. We 
also do a performance comparison on a synthetic hre particle sequence 
and apply our method to a real sequence of smoke circulation in a pigsty. 
Both consist of highly complex non-rigid motion. 



1 Introduction 

Motion analysis is a large topic within computer vision and image analysis, 
because knowledge of motion, as perceivable in image sequences, is necessary for 
various tasks such as object tracking, time-to-contact, structure from motion, 
etc. Motion analysis is conducted by associating a vector field of velocities to 
the image sequence, which describes the rate and direction of change of the 
intensity values. We define the optic flow as the velocity field, which describes 
the temporal changes of the intensity values of the sequence. Optic flow is not 
equivalent to a projection of the true physical motion onto the image plane, 
among other things, because of changing intensity values caused by reflections 
and varying lighting conditions. But the main reason for this is the aperture 
problem as coined by Marr US!. The aperture problem is the fact that we can 
only deduce the motion that results in a change in the intensity patterns of the 
captured image sequence. The problem of determining the motion along image 
isophotes (i.e. iso-intensity curves) is inherently ambiguous. By introduction of 
assumptions about the flow we can obtain various types of realizations of optic 
flow. Usual assumptions used in other methods are local or global rigidity or 
affinity. The only unbiased local motion we can obtain is the motion orthogonal 
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to the isophotes of the image. This type of velocity field is called the normal 
flow. 

A large variety of methods exist for estimation of optic flow 1112141 ^ 12119 ! 
(which is far from an exhaustive list). See also Barron et al. [ 3 ] for a discussion 
and evaluation of various methods. 

We wish to develop an algorithm, which can be used for doing motion analysis 
in experimental fluid dynamics (see EOI). For that reason it is important that 
it is least committed, in order to reduce possible biases in the approximated 
flow. Since normal flow is the only unbiased or least committed realization of 
optic flow, we will use it as a basis for obtaining an estimate of the full flow. We 
believe that a least committed estimate can be obtained by locally modeling the 
full optic flow on top of the normal flow. Ideally we seek to estimate the optic flow 
field which at each point has the corresponding normal flow vector as its normal 
componeniQ. This constraint only fix one degree of freedom (d.o.f.), which means 
that an infinite number of solutions exist. In order to circumvent this problem, 
we introduce a polynomial model of the local optic flow. We use a linear Gaussian 
scale-space to formalize the concept of local validity or the integration scale of 
our model. This integration scale-space regularizes the problem and lets us fix the 
missing tangential d.o.f. by estimating, in a least squares sense, the parameters 
of our local model constrained by the normal flow found at the integration scale 
or in the region of model validity. The solution to this constrained minimization 
problem can be stated in closed form and it is expressed in terms of a Taylor 
expansion of the normalized structure tensor. In order to take full advantage of 
this multi-scale approach we use an automatic local scale selection mechanism, 
based on a method by Niessen et al. HHj, to select the scale of model validity. 

In this paper we compute the normal flow by using the method proposed by 
Florack et al. 0. It is an incorporation of the so-called Optic Flow Constraint 
Equation (OFCE), originally proposed by Horn and Schunck [B|, into the linear 
Gaussian scale-space formalism. In this method the normal flow is in general 
modeled by an M’th order polynomial, but we choose the somewhat restrictive 
zeroth order model. This choice is based on the assumption that locally this 
model is a good approximation of the normal flow. Other choices of model order 
and normal flow methods are possible. 

We believe that scale-space integration of normal flow and modeling locally 
the optic flow in this scale-space is different from other methods. The conscious 
inclusion of measurement scale lets us select the appropriate neighborhood in 
which our local model is valid. Other authors |2Kl.i.2l.ii| have introduced polyno- 
mial models of the flow structure directly into the OFGE, contrary to modeling 
the optic flow on top of the normal flow as we propose. The use of an integra- 
tion neighborhood for the approximation of optic flow and in the related stereo 
matching problem has been used by several authors 1114191121191 . 

We evaluate the performance of our algorithm by using the methods pro- 
posed by Barron et al. (3| in their survey of performance of optic flow methods. 
Our performance results will be compared with the results for other optic flow 

^ By normal component we mean the projection of the sought optic flow onto the 
direction orthogonal to the local isophote. 
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techniques on among other a synthetic fire sequence. In order to show a possible 
application of this least committed approach, we also apply our method to a real 
sequence of smoke circulation in a pigsty. 

The organization of this paper is the following: In Sec. 0 we describe our 
scale-space representation as well as some notation. Sec. 0 is a brief introduction 
to the theory of normal flow. The proposed method for computation of optic flow 
is presented in Sec. 0 and finally, the performance of the presented algorithm is 
discussed in Sec.0 



2 Spatiotemporal Gaussian Scale-Space 



Both Koenderink El and Lindeberg El have proposed temporal causal scale- 
space representations. In this paper we choose simply to disregard temporal 
causality, since we are not interested in real-time applications. It is our opinion 
that a generalization of our method to one of the above mentioned temporal 
causal representations is possible. We choose to use the linear Gaussian scale- 
space representation, introduced among others by Koenderink ma. 

We use the spatiotemporal scale-space representation L(x; cr, r) : IR'^'*'^ x 
IR+ X 1R+ I— > IR of the spatiotemporal image f{x) : IR'^’’"^ i— > IR defined by 
the convolution of the image with the scale-space aperture function Gixia.r) : 
IR^+i X IR+ X IR+ IR, 

L{x;a,r) = f{x)*G{x-,a,T). (I) 



The aperture function is the spatiotemporal Gaussian 



G{x-a,T) = 



\'2TTT^{2'Ka‘^)^G 



exp 






(2) 



where a and r are called the spatiotemporal scale parameters. Throughout this 
paper, we will use the index notation where denotes the i’th spatial component 
of the vector x, and x* denotes the temporal component. When we are talking 
about both spatial and temporal components we write the component index with 
a Greek letter, e.g. x^. We will also sometimes make use of Einstein’s summation 
convention, i.e. repeated lower and upper indices indicates summation over these, 
x^x’‘ = J2i 



3 Theory of Normal Flow 

We will in this paper assume that we can compute the normal flow of an image 
sequence. We choose to use the scale-space OFGE proposed by Florack et al. ^ 
as a method for computation of normal flow. We will briefly outline this method 
in order to establish some necessary notation. 

Horn and Schunck 0 proposes an optic flow method which is based on the 
local assumption that the image intensities are preserved at points moving along 
the flow. Furthermore, it is assumed that the temporal velocity component is 
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constant, v*{x) = 1. This assumption expresses that the flow is everywhere 
non-vanishing and temporal causal. These assumptions lead to the well-known 
optic flow constraint equation (OFCE). It is interesting to notice that the later 
assumption breaks down generically in a countable set of points (ini). In this 
paper we will assume that this assumption is valid at every point in the image. 

Florack et al. |3| incorporates the ideas of Horn and Schunck into the scale- 
space paradigm, by defining the optic flow constraint equation under the spa- 
tiotemporal scale-space aperture. For a 2-dimensional image sequence f{x), the 
OFCE can be written as the Lie derivativ^EI under the scale-space aperture 
G{x',(7,t) of the intensity function of the sequence along the flow v{x). We 
have 

L^L{x;a,T) = f {ft + v^fx + vyfy)Gdx = 0 (3) 

J ccGIR^ 

where = df /dx^. 

Normal flow is defined by the so-called normal flow constraint. Florack et al. 
define the normal flow constraint under the scale-space aperture function as 

[ {vyy-vyU)Gdx = 0. (4) 

Florack et al. introduce a polynomial model of the flow v into (EJ and ®. 
This lets them state the problem of normal flow estimation in terms of a system 
of linear equations. Each term of these equations can be expressed as a linear 
combination of scale-space image derivatives, due to the definition of Hermite 
polynomials in terms of Gaussian derivatives. As mentioned, we will only use 
the zeroth order model for approximation of normal flow, but we will use the 
M’th order polynomial model proposed by Florack et al. as our local model of 
the optic flow. We define the M’th order polynomial model vm{x) of the vector 
held v{x) at a: = 0 as 



1 

1=0 

Here denote the components of the spatiotemporal vectors x,v G IR^"''^ 

and are the model coefficients approximating the partial derivatives of 

v{x = 0) at the origin. 



4 Least Committed Optic Flow 

Ideally we want to obtain an optic flow field which has a normal component 
that is equal to the normal flow. This is an ill-posed problem, because an infinite 
number of solutions exist, due to our lack of knowledge about the tangential com- 
ponent. We intend to regularize this problem by locally modeling the tangential 

^ The Lie derivative of a scalar function f{x) is the derivative of f{x) along the 
direction of a specihed vector held v, L„/ = fyv'^. 



18 



Kim S. Pedersen and Mads Nielsen 



component of the optic flow field by a polynomial model, under the constraint 
that the normal component of the model should be close to the normal flow. We 
use a spatiotemporal integration scale-space to define the scale of validity of our 
local model. 

This constrained optimization problem can be formalized by a functional 
E{v{x; zu)) of the optic flow field v{x\ w) at integration scale t37, describing the 
degree of discrepancy between the normal flow field u(x) and the normal com- 
ponent of the sought optic flow field under the Gaussian integration apertur^ 
G{x\ w). We define the discrepancy as the least squares difference, thus we can 
write the functional as 

E{v{x]w)) = / w{x)'^ri{v ■ r)) — u^G{x]w) dx , (6) 

where r]{x) = is the normalized direction of the normal flow u(x). The term 
w{x) is a function of the uncertainty of the underlying normal flow estimates 
and acts as a weight, which penalizes poorly estimated normal flow vectors. In 
Sec. 0 we choose to use the numerical uncertainty of the normal flow as the 
penalty function w{x), but other choices are of course possible. 

We introduce the M’th order polynomial model vm, defined in (0, into 
the functional E{v{x;w)). The model parameters can be obtained by 

minimizing the functional E{vm{x;w)), that is 

^pi...p, = arg min E{vm) , . (7) 

The integrand of the functional E(vm) is quadratic, which trivially means that 
one minimum exists and that this minimum is the solution to the system 

of differential equations given by dE/dv^^ = 0. We therefore arrive at: 

Result 1 (M’th Order Optic Flow). The M ’th order optic flow approxima- 
tion vm{x;w) is given by (EJ), where the model parameters are obtained 

by solving the minimization problem of The solution is given by the following 
set of linear equations 



dE 



M 



dv^i-Pi 






i=0 



JxeTR’^+ 



w{x)rj^r]^ {x; zu) dx 

w{x)u^Mpj^...pflx;zj) dx = 0 , (8) 



cGIR'^+’ 



where M.p^...pi (x\ zz) = a;'’’ • • • x^‘G{x, w) is the I ’th order mixed Gaussian mono- 
mials at integration scale zu. 



Proof. We seek the solution of 



dE 



dv^i-Pi 



= 0 . 



( 9 ) 



® For the sake of simplicity we choose to use a scale-space representation with only one 
integration scale parameter zu. The spatiotemporal scale-space aperture function of 
0 can readily be interchanged with the aperture function G{x-, zu). 
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By using the chain rule dEjdv^^ = / dq/dv^^ p^dF/dqdx, where F{vm) is 
the integrand of F{vm) and q = vm ■ we get 



OF 



M 



dvt 



'pi-pi 9a;GlR^+i 



.C’WfEk... 



\i=0 



— ■q^{rj • m) ) • ■ ■ ■ x^‘G{x', w) dx = 0 . (10) 



Notice that = 1 , 77^(77-11) = and that the constant 2/U can be removed. 
If we introduce the notation Adpi...p, (x; w) = ■ ■ ■ x^'G{x^ w), this system of 

linear equations can be written as 



dE 

di>n-Pi 



M ^ 

Asir"+ 



r,0 



(x) {ri^r]I^My^..MiPi...pi (®; 

— u^Mp-^...pi{x\w)\ dx = Q . (11) 



We developed our method at the origin of the vector field v(x — 0 ), which means 
that in general a translation to the point of interest should be introduced, which 
in turn transforms the integrals of Result Q into convolution integrals. 

The Z’th order mixed Gaussian monomials Alpi,,,p, (a:; uj) can be expressed in 
terms of linear combinations of the partial derivatives of the Gaussian by using 
the definition of Hermite polynomials and the separability of the Gaussian (see 
e.g. 0 ). This lets us interpret the two integrals of ResultHas a set of scale-space 
derivatives of the normal flow u and the matrix rj^rj^ . 

Since the normal flow is defined to be parallel to the gradient direction, 
the matrix {wiq^r]^) * G{x; w) of Result ^can be interpreted as the normalized 
spatiotemporal structure tensor (see e.g. 123 !). For the Florack et al. ^ method 
the normal flow is only parallel to the gradient in the zeroth order case. This 
gives another possible interpretation of Result ^ namely that we are seeking 
the flow field which, by the product with the normalized structure tensor, is 
equal to the normal flow. The introduction of our flow model corresponds to a 
Taylor expansion of the normalized structure tensor. Other authors have used 
the structure tensor as the basis of methods among other things for doing motion 
analysis IZj. 

In Result H we assume that the Gaussian derivatives are scale normalized 
and furthermore we use natural coordinates, — . We use the standard method 

' G 

of scale normalization which is based on dimensional analysis, {31, and can be 
stated as 

■ d’^L 

In the context of feature detection Lindeberg PI has proposed a method in 
which the structure of the raw image controls the normalization factor. Pedersen 



norm 






( 12 ) 
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and Nielsen m has augmented this method by introducing the fractal dimension 
of the underlying image structure as a control parameter of the normalization 
factor. In this paper, We have chosen to use the standard normalization method 
instead of the more advanced normalization procedures, but for future work, it 
could be interesting to examine the use of image structure as a control parameter 
for normalization of derivatives in the context of optic flow field estimation. 

The structures of image sequences exist on different scales and consequently 
the optimal measurement scale will vary between different regions in an image 
sequence. In conjunction with optic flow and the related stereo vision problem 
different approaches to the problem of scale selection has been taken; Kanade 
and Okutomi 0, Weber and Malik G&ding and Lindeberg 0, Niessen et 
al. and Nielsen et al. |1 fij . 

In this paper an automatic local scale selection method will be used similar 
to the one proposed by Niessen et al. UHl- They propose to use the numerical 
stability as a criteria for scale selection. As a measure of numerical stability they 
use the Frobenius norm || of the system of linear equations Av — b used in 

the method by Florack et al. 0 . The Frobenius norm of a matrix is the sum of the 
singular values of that matrix. Since we use the Florack et al. multi-scale normal 
flow method, the outcome of Result Q] is a spatiotemporal integration scale- 
space of the vector field vm{x; ct, t, w). An automatic scale selection mechanism 
can therefore be stated as the selection of the spatiotemporal integration scale 
triplet {a,T,w) at each point in space-time, for which is minimal. Here 

= {w{x)r]>^ri^) * M.u^...vipx...pi represent the matrix part 
of the system of linear equations described in Result ^ 

The algorithm proposed in this paper is clearly a sequential process of the 
computation of normal flow followed by the computation of the optic flow. The 
computations of the different scale-space derivatives in the normal flow and optic 
flow steps can readily be parallelized. The sequential structure of the process lets 
us implement the proposed algorithm in a highly modular fashion. 



5 Discussion of Performance 



We compute the zeroth and first order optic flow for different benchmark image 
sequences and compare the results with the findings of other authors. We use the 
angular error measure used by Barron et al. [3 as well as three of their synthetic 
benchmark image sequences with known ground truth: The translating trees 
(TTS), diverging trees (DTS), and Yosemite sequences. We also do a comparison 
on a synthetic fire particle sequenc^ (see Fig. In order to show that our 
method can handle complex non-rigid motion, we compute the optic flow of a 
real sequence of smoke circulation in a pigsty (see Fig. n. The angular error 
e = arccos({)c • i)e) is defined as the angle between the measured vector and 
the correct vector where v = (l,u“,u^) and i) = t>/||r;||. 

^ Available through anonymous FTP, 
ftp : //ftp. diku.dk/diku/users/kimstp/fire .tar .gz 
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Figure 1. The middle image from the synthetic fire particle sequence (left) and 
the corresponding correct flow field (right). The sequence consists of 32 images 
of 256 X 256 gray value pixels with velocities between 0 and 3.79. We show every 
fourth vector for the middle image scaled by a factor of 3. 



In the following we use the abbreviations 

'Plj {x; a, T, w) = {w{x)rf(x\ a, T)rj^ (x; a, t )) * Mki(x; w) 
’Ph{x-a,T,w) = (w{x)u\x;a,T)) * Mu{x]w) . 



(13) 



In order to keep the notation simple we assume that each partial derivative 
of the Gaussians in Mki{x;w) is scale normalized by II 1 211 . When using scale 
normalization of the derivatives it is important to remember that this effectively 
corresponds to changing the measurement units, hence the computed flow vectors 
are expressed in units of the scale. As an uncertainty measure of the normal flow, 
we use w{x) = l/||A^^|jp, where the matrix A is given by the linear equations 
Au = b defining the normal flow (Q and ©)• 

We use the zeroth and first order spatial model for the optic flow even though 
our method in general lets us use spatiotemporal models. This means that we 
purely base our approximation of optic flow on a spatial analysis of the un- 
derlying spatiotemporal normal flow field. According to Result d the optic flow 
approximated by the zeroth order spatial model Vq = v’‘ is given by the solution 
to 

l]/xx~x _|_ xpxVyy — (l)X 

tpxy~x _|_ ^yy;j^y _ ^y ^ (14) 



The optic flow modeled by the first order spatial model v\ = + v^x + Vy-y 

is given by the solution to the system of linear equations given by the partial 
derivatives of the functional E{vi) with respect to the six model parameters 



v) = {v 



-b = 0 

g, = qf-Vi- + ^yy^v + + ^yjvy + + ^yjyy - <i>} = o . 



(15) 



In order to solve these linear equations we have used the pseudo inverse of the 
matrix since it can be close to singular. 
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Table 1. Mean angular errors and corresponding standard deviations of ap- 
proximated zeroth (M = 0) and first {M = 1) order optic flow based on zeroth 
order normal. The results have been computed at different sets of spatiotem- 
poral and integration scales (cr, r, ro). The last two rows show results produced 
with the automatic scale selection method discussed in Sec. El The scales was se- 
lected from the sets a e {1.0, 1.414, 2.0, 2.828, 4.0, 5.656, 8.0}, r G (1.0, 2.0, 3.0}, 
w G {1.0, 2.0, 4.0, 8.0}. 



Parameters 


TTS 

Mean St. dev. 


DTS 

Mean St. dev. 


Yosemite 
Mean St. dev. 


Fire 

Mean St. dev. 


M=0, (1,2,2) 


0.99 


4.50 


2.43 


3.42 


20.15 


17.37 


5.92 


14.61 


M=l, (1,2,2) 


0.87 


1.76 


1.17 


1.74 


21.83 


19.00 


5.78 


14.79 


M=0, (2,2,4) 


0.42 


0.87 


3.41 


2.64 


17.16 


13.89 


7.88 


14.60 


M=l, (2,2,4) 


0.52 


1.23 


1.49 


1.88 


16.94 


13.45 


7.42 


14.78 


M=0, (4,2,8) 


0.31 


0.19 


5.19 


3.10 


18.42 


11.95 


12.88 


16.62 


M=l, (4,2,8) 


0.25 


0.25 


1.49 


0.88 


17.30 


10.69 


11.71 


17.75 


M=0, multi-scale 


0.34 


0.23 


5.02 


3.86 


11.50 


15.66 


7.85 


15.57 


M=l, multi-scale 


0.15 


0.11 


0.80 


0.47 


8.51 


12.57 


7.55 


15.65 



We have computed the optic flow for the four synthetic benchmark sequences 
using fixed scales and automatic scale selection (Table EJ . We see that for some 
types of sequences the automatic scale selection improves the results. This is 
not true for the Are sequence and zeroth order results for all but the Yosemite 
sequence. This indicates that scale selection based on numerical stability might 
not be the best solution. We believe that a way to improve this would be to in- 
corporate information of the structure of the normal flow into the scale selection 
mechanism. Furthermore, for certain fixed fine scales we And that the first order 
model does not produce better results than the zeroth order model. The reason 
for this is that at fine scales the accuracy of the higher order partial derivatives 
needed in the first order model reduces. Note as well that the zeroth order optic 
flow model does not handle the sequences consisting of non-translational motion 
well; this concerns the DTS, Yosemite, and Are sequences. This is not surpris- 
ing considering that these sequences consist of a type of motion which is poorly 
modeled by this type of model. 

In Table El we show some of the results from Table E together with the best 
results of other optic flow techniques. Unfortunately we could only get results 
for the Yosemite sequence for the Alvarez et al. P method. We see that both the 
zeroth and first order scale selected optic flow models perform as well as, and in 
some cases better than, other methods for most benchmark sequences. For the 
diverging trees and Yosemite sequences the results of the zeroth order optic flow 
are mediocre. The reason for this is, as mentioned above, that the zero order 
model is a poor model of this type of flow. We expect that the first order result 
for the Yosemite sequence is mediocre, because of the apparent limitations of 
the scale selection method and our limiting choice of spatial models of the optic 
flow. Our method delivers good results for the Are sequence, which shows that 
it works well on sequences consisting of complex non-rigid motion. 
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Table 2. Mean angular errors and standard deviations for different optic flow 
techniques obtained from Barron et al. P] (the best results for different methods), 
Alvarez et al. P], and Florack et al. m (fi rst order multi-scale results). The 
last three rows are results obtained with our method. The fire results for other 
methods were computed by using the implementations by Barron et al. |3]. 



Techniques 


TTS 

Mean St. dev. 


DTS 

Mean St. dev. 


Yosemite 
Mean St. dev. 


Fire 

Mean St. dev. 


Horn & Schunck 


2.02 


2.27 


2.55 


3.67 


9.78 


16.19 


9.08 


18.97 


Uras et al. 


0.62 


0.52 


4.64 


3.48 


8.94 


15.61 


14.68 


25.64 


Nagel 


2.44 


3.06 


2.94 


3.23 


10.22 


16.51 


10.84 


21.75 


Anandan 


4.54 


3.10 


7.64 


4.96 


13.36 


15.64 


14.38 


22.77 


Singh (step 2) 


1.25 


3.29 


8.60 


5.60 


10.44 


13.94 


9.94 


19.71 


Alvarez et al. 


- 


- 


- 


- 


5.53 


7.40 


- 


- 


Florack et al. 


0.49 


1.92 


1.15 


3.32 


- 


- 


- 


- 


M=f, (1,2,2) 


0.87 


1.76 


1.17 


1.74 


21.83 


19.00 


5.78 


14.79 


M=0, multi-scale 


0.34 


0.23 


5.02 


3.86 


11.50 


15.66 


7.85 


15.57 


M=l, multi-scale 


0.15 


0.11 


0.80 


0.47 


8.51 


12.57 


7.55 


15.65 



In Fig. 0 we show the zeroth and first order optic flow for a real complex 
sequence of smoke circulation in a pigstjQ. Notice that our method captures the 
circular motion in the sequence and that the produced fields follow the structure 
of the smoke. It is clear that the zeroth order model breaks down at several 
points contrary to the first order model which seems to do a good job at almost 
all image points. We would expect that higher order models would improve the 
results to some extend, but it is also well-known that too complex models would 
lead to over-fitting to the data and it is therefore important to choose the model 
order carefully. 

6 Conclusion 

In this paper we have presented an algorithm for approximation of optic flow 
based on a polynomial model of the local flow and regularized by a Gaussian 
integration scale-space. The method fits the normal component of the model 
to the underlying normal flow, which we presume is given, and the tangential 
component is extracted by integration of the local structure of the normal flow. 
In order to take full advantage of the multi-scale property of the method, we have 
suggested the use of an automatic local scale selection mechanism proposed by 
Niessen et al. HSI- We have compared the performance of the proposed method 
based on zeroth and first order models with that of other optic flow methods, 
P3E|. We thereby show that the method with these models performs as well 
as, and in some cases outperforms, other optic flow methods. 

The optic flow method proposed in this paper is least committed in the 
sense that we model the local variation of the optic flow by a polynomial model 

® See http://www.diku.dk/users/kimstp/demos/ for more details. 
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Figure 2. Smoke circulation in a pigsty. The sequence consists of 16 images of 
256 X 256 gray value pixels. Here we only show a part of the middle image of 
the sequence with the corresponding zeroth (left) and first (right) order scale 
selected optic flow. We plot every fourth vector scaled by a factor of 5. In the 
top of the image there is a circular motion from left to right and in the bottom 
there is a slow motion from right to left. 



for which the range of validity is determined by the choice of local integration 
scale. This makes the proposed algorithm a useful tool in e.g. experimental fluid 
dynamics (|2]j), which we illustrated with an analysis of a sequence of smoke 
circulation in a pigsty. We measured the performance of our and other methods 
on a complex synthetic fire particle sequence consisting of non-rigid motion, 
thereby showing that our method is able to handle this type of motion better 
than other methods. 

When using this as well as other methods, it is important to choose the model 
order carefully, because the actual number of d.o.f. will vary across the image. In 
regions with non or little local flow structure we would expect that zeroth order 
flow would give us accurate measurements, because of the low number of d.o.f. 
Regions with a large amount of local flow structure leads to a large number of 
d.o.f. and therefore higher order models are necessary. We therefore believe that 
a local order selection mechanism, like the minimum description length principle, 
would be a valuable tool combined with the optic flow method proposed in this 
paper. 
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Abstract. We address the theoretical problems of optical flow estima- 
tion and image registration in a multi-scale framework in any dimension. 
We start by showing, in the translation case, that convergence to the 
global minimum is made easier by applying a low pass filter to the images 
hence making the energy “convex enough” . In order to keep convergence 
to the global minimum in the general case, we introduce a local rigidity 
hypothesis on the unknown deformation. We then deduce a new natural 
motion constraint equation (MCE) at each scale using the Dirichlet low 
pass operator. This allows us to derive sufficient conditions for conver- 
gence of a new multi-scale and iterative motion estimation/registration 
scheme towards a global minimum of the usual nonlinear energy instead 
of a local minimum as did all previous methods. We then use an im- 
plicit numerical approach. We illustrate our method on synthetic and 
real examples (Motion, Registration, Morphing). 



1 Introduction 



Registration and motion estimation are one of the most challenging problems in 
computer vision, having uncountable applications in various domains jl Mil 4l6Bj , 
ITim . These problems occur in many applications like medical image analysis, 
recognition, visual servoing, stereoscopic vision, satellite imagery or indexation. 
Hence they have constantly been addressed in the literature throughout the de- 
velopment of image processing techniques. As a first example (Figure 0 consider 
the problem of finding the motion in a two-dimensional images sequence. We 
then look for a displacement (hi{xi,X2),h2{xi,X2)) that minimizes an energy 
functional: 



J j \h(x,y)~ I2{x + hi{x,y),y + h2{x,y))\^dxdy. 

Next consider the problem of finding a rigid or non rigid deformation (/i(a;i, X2), 
/2(a:i,a;2)) between two images (Figure Q), minimizing an energy functional: 

J J \h{x,y) - I2ifi{x,y)j2{x,y))\‘^dxdy. 



M. Kerckhove (Ed.): Scale-Space 2001, LNCS 2106, pp. 26-|^^ 2001. 
© Springer- Verlag and lEEE/CS 2001 
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Fig. 1. Two images on the left: Finding the motion in a two-dimensional images 
sequence. Two images on the right: Finding a non rigid deformation. 



At last consider the stereoscopic matching problem: given a stereo pair, the 
epipolar constraint allows to split the two-dimensional matching problem into a 
series of line by line one-dimensional matching problems. One has just to find, 
for every line, the disparity h(x) minimizing: 

J |/i(a;) - hix + h{x))\“^dx. 



Although most papers deal only with motion estimation or matching de- 
pending on the application in view, both problems can be formulated the same 
way and be solved with the same algorithm. Thus the work we present can be 
applied both to registration for a pair of images to match (stereo, medical or 
morphing) or motion field / optical flow for a sequence of images. In this paper 
we will focus our attention on these problems assuming grey level conservation 
between both signals or images to be matched. Let us denote by Ii{x) and 
l 2 {x) respectively the study and target signals or images to be matched, where 
X G D = [-M, M]'^ C 1R‘^, and d > 1. In the following Ii and I 2 are supposed to 
belong to the space Cq{D) of continuously differentiable functions vanishing on 
the domain boundary dD. We will then assume there exists a homeomorphism 
/* of D which represents the deformation such that: 

h{x) = ho f*{x),yx e D. 

In the context of optical flow estimation, let us denote by h* its associated motion 
field defined by h* = f* — Id on D. We thus have: 

Ii{x) = h{x + h*{x)). (1) 

h* is obviously a global minimum of the nonlinear functional 

ENL{h) = ^J \h{x) - h{x + h{x))\'^dx. ( 2 ) 

We can deduce from © the well known Motion Constraint Equation (also called 
Optical Flow Constraint): 



Ii{x) — h{x) ~< \7h{x),h*{x) > ,\/x & D. 



( 3 ) 
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Ejq L is classically replaced in the literature by its quadratic version substituting 
the integrand with the squared difference between both left and right terms of 
the MCE, yielding the classical energy for the optical flow problem: 



Edh) = i / \h{x)-h{x)~ < Vh{x),h{x) > \^dx. 

^ Jd 

Here V denotes the gradient operator. Since the work of Horn and Schunk ini, 
MCE ( 0 ) has been widely used as a first order differential model in motion 
estimation and registration algorithms. In order to overcome the too low spatio- 
temporal sampling problem which causes numerical algorithms to converge to 
the closest local minimum of the energy E^l instead of a global one, Ter- 
zopoulos et al. [Ii8l2,'fj and Adelson and Bergen [8122) proposed to consider it 
at different scales. This led to the popular coarse-to-fine minimizing technique 
1141110111111 Ij . It is based on the remark that MCE Q is a first order expan- 
sion which is generally no longer valid with h* searched for. The idea is then 
to consider signals or images at a coarse resolution and to refine iteratively the 
estimation process. Since then many authors pointed out convergence properties 
of such algorithms towards a dominant motion in the case of motion estima- 
tion paia, or an acceptable deformation in the case of registration fmwm - 
even if the initial motion were large. Let us mention that many authors assume 
that deformation fields have some continuity or regularity properties, leading to 
the addition of some particular regularizing terms to the quadratic functional 
[1 81512 . This very short state-of-the-art is far from being exhaustive but 
it allows to raise four common features shared by all most effective differential 
techniques: 



1. a motion constraint equation, 

2. a regularity hypothesis on the deformation, 

3. a multi-scale approach, 

4. an iterative scheme. 



However, most of the multi-scale approaches assume that the MCE is more 
“valid” at lower resolutions. But to our knowledge and despite the huge lit- 
erature, no theoretical analysis can confirm this. It may come from the fact 
that blurred signals or images are always “more similar” . Choosing a particular 
low pass operator Un (here cr > 0 is proportional to the number of considered 
harmonics in the Fourier decomposition) and some deformation f* = Id+ h* 
satisfying a local rigidity hypothesis with respect to a signal or image Ii, we 
shall find a linear operator depending on Ii such that: 

n,(^h-i2)c^Pl-{h*), (4) 

the sharpness of this approximation being decreasing with respect to both h 
norm and resolution parameter cr. We are faced with the following motion 
size/structure hypothesis trade-off: for some fixed estimation reliability, the larger 
the motion, the poorer its structure. This transforms the problem to solving the 
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energy minimization in a finite dimensional subspace of approximation obtained 
through Fourier Decomposition. In this context we are led to consider the new 
energy to be minimized: 



Considering general linear parametric motion models for h* , we give suffi- 
cient conditions for asymptotic convergence of the sequence of combined motion 
estimations towards h* together with the numerical convergence of the sequence 
of deformed templates towards the target I 2 ■ Roughly speaking, the shape of the 
theorem will be the following: 

Theorem: If 

1. at each step the residual deformation is “locally rigid”, and the associated 
motion can be linearly decomposed onto an “acceptable” set of functions the 
cardinal of which is not too large with respect to the scale, 

2. the initial motion norm is not too large, and the systems conditionings do 
not decrease “too rapidly” when iterating, 

3. the estimated deformations Id + hi are invertible and “locally rigid”. 

Then the scheme “converges” towards a global minimum of the energy 

The outline of the paper is as follows. In Section |2| we recall the energy con- 
vexifying properties of multi-scale approaches together with fast convergence in 
case of purely translational motion. In Section 0 we turn to the general motion 
case and introduce a new local rigidity hypothesis and a low pass filter in order 
to derive a new MCE of the type of equation 0. In Section 2] we design an 
iterative motion estimation/registration scheme based on the MCE introduced 
in Section0and prove a convergence theorem. In order to avoid the a priori mo- 
tion representation problem, we adopt an implicit approach and constrain each 
estimated deformation to be at least invertible. We show numerical results for 
some signals and the stereo problem in dimension 1, and for large deformations 
problems in dimension 2. Section gives a general conclusion to the paper. 

2 Purely Translational Motion Estimation 

In this section we assume the motion to be found is only translational. This 
simple case will allow us to show the energy convexifying properties of multi- 
scale approaches together with fast convergence of iterative algorithms. 

2.1 Synthetic ID Energy Convexifying Example 

Consider a test signal (Figure I3) and its purely translated copies. The energy 
given by the mean quadratic error between shifted test signals and considered 
as a function of the translational parameter can be convexified using signals at 
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a poorer resolution. Indeed we show the energy as a function of the translation 
parameter calculated with original test signals (Figure 13) and with same signal 
at a poorer resolution (Figure EJ, namely signals reconstructed with only 5 and 3 
first harmonics of the Fourier base. This readily yields more and more convexified 
energies as the resolution is lower. Based on this convexifying property, a generic 
algorithm for estimating the translational parameter is as follows: 

1. Find the finest resolution j for which the energy is convex enough. 

2. Minimize the MCE-based energy with signals at resolution j. 

3. Refine the result by increasing the resolution and minimizing the new energy. 

2.2 Convergence Conditions 

In jlti| we prove that this iterative process can converge to the solution provided 
the initial motion norm is not too important with respect to the chosen signal or 
image resolution. This one-dimensional result was easily extended to dimension 
d > 1 (see [T7I ~). 

3 General Motion Multiresolution Estimation 

In Section El we have considered only purely translational motion estimation and 
registration. Our purpose here is to take over the general case for the motion. 
Our approach is based on the fact that the motion is hidden in the difference 
between both functions to be matched. This will lead us to analyze this difference 
at some particular resolution. Making some assumptions on the structure and 
local behaviour of the motion and the type of scale-space, we will find a new 
MCE and show that we can control the sharpness of it, which has not been taken 
care of previously. 



Fig. 2. Test Signal. First line: On the left, the second signal is the same shifted 
by 200; on the right: Energy as a function of shift parameter. There are numer- 
ous local minima around the global minimum at a; = 200 at scale 7.. Second line: 
same energy with signals reconstructed with only 5 harmonics (left) and 3 har- 
monics (right) using the multiresolution pyramid spanned by the first elements 
of the Fourier base. 
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3.1 Controlling the Residuals when Mixing Differential and 
Scale- Space Techniques 

Using a regularizing kernel Ga at scale cr, Terzopoulos et al. and Adelson 

and Bergen 0 were led to consider the following modified MCE: 

Ga * {h - h){x) ~< G, * Vh{x),h*{x) > 

To our knowledge and despite the huge literature on these approaches, no theo- 
retical error analysis can be found when such approximations are done. However 
it has been reported from numerical experiments that the modified MCE was 
not performing well at very coarse scales, thus betraying its progressive lack 
of sharpness. Assuming a local rigidity hypothesis and adopting the Dirichlet 
operator ilc, we will find a different right hand side featuring a “natural” and 
unique linear operator in the sense that: 

nMi - l 2 ){x) PlHh*){x), (5) 

with remainder of the order of for some particular norm and vanishing as 

the scale is coarser. 



3.2 Local Rigidity Property 

In this paragraph we introduce our local rigidity property of deformations. 
Definition 1. / G Hom{D) is ^-rigid for I\ G C^{D) iff: 

Jac(/)*.V/i = det{Jac{f))Vh, (6) 

where Jac{f) denotes the Jacobian matrix of f and det{A) the determinant of 
matrix A, and Hom{D) the space of continuously differentiable and invertible 
functions from D to D (homeomorphisms) . 

All ^-rigid deformations have the following properties (see the proofs). 

Assume /* is ^-rigid for Ii G Cq{D) and Ii = h ° f*- Then, 

1. equation (jEI) is always true if dimension c? is 1; 

2. suppose d = 2: then, 

3. if Jac{f*) is symmetric, then (jSI) means that if |V/i| yf 0, 

— direction 77 = is eigenvector (A = det{Jac{f) is an eigenvalue); 

— direction f = jv^ is “rigid” (A = 1 is an eigenvalue); 

then for all a; G D where Ji is not locally constant we have h{x) = h*{x). 



3.3 The Dirichlet Operator 

Let D = = {k G 



[l,d], |fci| < Mcr^}; Cfc(J) denotes 



(2M)7 



“sf dx. Then 



Vi G 

the Fourier coefficient of I defined by: Cfc (/) = 

the Dirichlet operator is the linear mapping associating to each function 
I G Cq{D) the function = G„ * /, where the convolution kernel G^ is 

defined by its Fourier coefficients as follows: 



Ck{G„) 



1 if k G So- 
0 elsewhere 
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Fig. 3. An example of motion h = f — Id of a ^-rigid deformation / for image 
Ji- We show a level set of image and the fields V/i and h along its boundary. 
h varies only along the direction of V/i. 



3.4 New MCE by Linearization for the Dirichlet Projection 

Now that we have introduced our rigidity property of deformations and the 
Dirichlet projection, we obtain the 

Theorem 1. If f* = Id + h* is f -rigid for Ii = I2 ° f* ^ ^o(D), then we have: 
\\n,{h-h)-Pi^{h*)\\L^ < 

This inequality is nothing but the sharpness of MCE Q: II^{Ii — l2){x) — 
pP{h*){x), at scale a. It clearly expresses the fact that measuring the motion 
(e.g perceiving the optical fiow) h* is not relevant outside of the support of |V/i |. 
Proof. See m ■ 

4 Theoretical Iterative Scheme and Convergence 
Theorem 

In section Elwe found a new MCE and showed that we can control the sharpness 
of it. In this section we will make a rather general assumption on the motion in 
the sense that it should belong to some linear parametric motion model without 
being more specific on the model basis functions. Though it is somewhat restric- 
tive to have motion fields in a finite dimensional functional space, this structural 
hypothesis will be a key to bounding the residual motion norm after registration 
in order to iterate the process. This makes it possible to consider a constraint 
on motion when there is a priori knowledge (like for rigid motion) or consider 
multi-scale decomposition of motion for an iterative scheme. 



4.1 Linear Parametric Motion Models and Least Square Estimation 

Let us assume the motion h* has to be in a finite dimensional space of de- 
formation generated by basis functions 'I'{x) = ('0i(a;))i=i..„. Thus h* can be 




Image Registration, Optical Flow, and Local Rigidity 



33 



decomposed in the basis: 3 0* = (6**)i=i,,„ unique, such that: 

h*{x) =< <F{x),0* >=Y^ 0*^j,{x), \/x G Supp{\\7h\). 

MCE (jSI) viewed as a linear model writes: — I 2 ) =< 0* > . Now 

set, for a s.t. the be mutually linearly independent in L^: 

M, = pji {W) ® {^) , Y„ = n ^ (Ji -h), 

where ® stands for the tensorial product in . Then applying basic results from 
the classical theory of linear models yields: h =< S', 0 >=< S', M~^B^ >, where 
column Ba's, components are defined by {Ba)i =< Pl^{il^i),Ya >. 

4.2 Estimation Error and Residual Motion 

Given the least square estimation of the motion of last paragraph, we have 
Lemma 1. In this framework the motion estimation error is bounded by in- 
equality \\{h-h*)\Wh\ih 2 < fa'^+2(Tr(M-i)))^||/i*|V/i|^||2,. 

Proof. See HH ■ 

If /d + h is invertible, we can define: 

=/io (7) 

Letting ri denote the residual motion such that /ip = /2 o [Id + ri), if Id + h 
is ^-rigid for Ii then a variable change yields equality \\{h — h*)\WIi = 

lki|v/i,i| = ||L2, thus giving by Lemma Q the following bound on the residual 
motion norm: 

In view of equality 0 and inequality (0, iterating the motion estimation/re- 
gistration process looks completely natural and allows for pointing out sufficient 
conditions for convergence of such a process. Indeed, provided the same assump- 
tions are made at each step, relations (ED and ® can be seen as recurrence ones, 
yielding both Xp and Ii p sequences. 

4.3 Theoretical Iterative Scheme 

Having control on the residual motion after one registration step, we deduce the 
following theoretical iterative motion estimation / registration scheme: 

1. Initialization: Enter accuracy e > 0 and the maximal number of iterations 
N. Set p= 0, and /i,o = h- 

2. Iterate while (||/i,p — hW > e p < N) 

(a) Enter the set of basis functions Wp = (V'p,i)i=i..np that linearly and 
uniquely decompose Xp on the support of |V/i_p|. 

(b) Enter scale Cp and compute: hp =< I'p,M~^^B„^ > . 

(c) Set /i,p+i = Ii,p o {Id 3- hp)“b 
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4.4 Convergence Theorem 

Now that we have designed an iterative motion estimation / registration scheme, 
let us infer sufficient conditions for the residual motion to vanish. This leads us 
to state our following main result: 

Theorem 2. If: 

1. \/p > 0, Ii^p ^ I 2 (as defined in Section Vr^ . and residual motion Vp can he 
linearly and uniquely decomposed on a set of basis functions {ipp^i, i = l.-Up}; 

2. Vp > 0, there exists a scale CTp > 0 such that the set of functions {Pal’’’ , 

i = l..rip} be free in and, for p = 0, we assume that : 

\\h*\Vh\i\\L. < (^^4+^Tr{Mo,ao)iy'; 

Set Co = 

3. The sequence of conditioning ratios satisfy criteria: \/p > 0, 

4- Vp > 0, estimated deformations Id+ hp G Hom{D) and are f^-rigid for I\^p; 
Then, limp^oo l|fp|V/i,p|^/^||i 2 = 0. 

Proof. See HH ■ 

4.5 Numerical Algorithm Requirements 

Firstly, due to the fact that /i* is unknown we have to make an arbitrary choice 
for the scale at each step. Secondly we at least have to ensure that Id + h he 
invertible at each step. Finally we are faced with the motion basis functions 
choice. 



Multi-scale Strategy . The scale choice expresses both a priori knowledge on 
the motion range and its structure complexity. Here we assume that (crp)p is an 
increasing sequence, starting from (Tq > 0 such that: 



ffSao > #{expected independent motions}. (9) 

Then let a g] 0, 1[. In order to justify the minimization problem at new scale 
cTp+i > Up, we will choose it such that: 

IK-^CTp+i - Pa^){l\,p+\ - I2 )\\l^ > a||Fi,p+i - / 2 IIL 2 , (10) 

Invertibility of Id + hp. Let /3 > 0. We will apply to Ji_p the inverse of the 
maximal invertible linear part of the computed deformation e.g. ^Id+P.hp^ , 
where 

P = sup {t / det{Jac{Id + t.hp)) > /3j. (11) 
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Choosing the Set of Basis Functions . A major difficulty arising in the 
theoretical scheme comes from the lack of a priori knowledge on the finite set of 
basis functions to be entered at each step. In Sectional we will use an implicit 
approach via the optimal step gradient algorithm when minimizing the quadratic 
energy associated to MCE 



5 Implicit Approach of Basis Functions and Results 



We now use the optimal step gradient algorithm for the minimization of the 
quadratic functional associated to MCE There are at least two good reasons 
for doing this: 

— the choice of base functions is implicit: it depends on the signals or images 
/i and I 2 , and the scale space. 

— we can control and stop the quadratic minimization if the associated operator 
is no longer positive definite. 

The general algorithm does not guaranty that the resulting matrix be 

invertible. Hence we suggest to systematically use a stopping criteria to control 
the quadratic minimization, based on the descent speed or simply a maximum 
number of iterations Nc- In that case our final algorithm writes: 

1. Initialization: Enter accuracy e > 0 and the maximal number of iterations 
N. Set p = 0, hfi = I\, and choose first scale cto according to (0. 

2. Iterate while (||/i,p — I 2 W > e & p < TV & dp < 1) 

(a) Choose Cp satisfying (El. 

(b) Apply Ng iterations of the optimal step gradient algorithm for the min- 
imization of Ep{h) = ||il<^p(/i,p - h) - 

(c) Compute /i,p+i = /i,p o [Id + t* ,hp)~^ with t* defined by (ITTll and 
increment p. 

In the following experiments we have set a = 2.5%, Nq = 5, /3 = 0.1. In 
ini. we show results on one-dimensional synthetic and real signals, and with all 
intensity lines of a stereo pair. Recall that ^-rigidity is not a constraint when 
d = 1 and thus hoa is relevant only when |/((a:)| 0. 

We illustrate the algorithm on pairs of images with large deformation for regis- 
tration applications and movies for motion estimation applications. 



Registration Problems Involving Large Deformation : In figure 0 we 
show the different steps of the algorithm performing the registration between 
the first and last images. In Figure 0 we show the study and target images, and 
the deformed study image after applying the estimated motion. 
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Fig. 4. Registration movie of a target to a ’C’ letter. Again, each image corre- 
sponds to a step in the iterative scheme. 

■I 'W W| '■ 

Fig. 5. Scene registration example: Study image (left), deformed Study image 
onto Target image (center), and Target image (right). 



Optical Flow Estimation Examples : In Figure Elwe show the sequence of 
the registered images of the original Cronkite sequence onto first image using 
the sequence of computed backward motions. The result is expected to be mo- 
tionless. On top of Figure El we show the complete movie obtained by deforming 
iteratively only the first image of Cronkite movie. For that we use the sequence 
of computed motions between each pair of consecutive images of the original 
movie. In Figure E| on the bottom, we see the error images. 

6 Conclusion 

We have addressed the theoretical problems of motion estimation and registra- 
tion of signals or images in any dimension. We have used the main features 
of previous works on the subject to formalize them in a framework allowing a 
rigorous mathematical analysis. More specifically we wrote a new ridigity hy- 
pothesis that we used to infer a unique Motion Constraint Equation with small 
remainder at coarse scales. We then showed that upon hypotheses on the motion 
norm and structure/scale tradeoff, an iterative motion estimation/registration 
scheme could converge towards the expected solution of the problem e.g. the 
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Fig. 6. On the left, registered sequence of the original sequence onto first image 
using the computed backward motions. On the right, movie obtained by de- 
forming only the first image of Cronkite movie using the sequence of computed 
motions 



global minimum of the nonlinear least square problem energy. Since each step 
of the theoretical scheme needs a set of motion basis functions which are not 
known, we have designed an implicit algorithm and illustrated the method in 
dimension one and two, including large deformation examples. 
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Abstract. According to the Marr paradigm cm, visual processing is 
performed by low-level feature detection followed by higher level task 
dependent processing. In this case, any two images exhibiting identical 
features will yield the same result of the visual processing. The set of im- 
ages exhibiting identical features form an equivalence class: a metameric 
class [Zj. We choose from this class the (in some precise sense) simplest 
image as a representative. The complexity of this simplest image may in 
turn be used for aualyzing the information content of features. We show 
examples of images reconstructed from various scale-space features, and 
show that a low number of simple differential features carries sufficient 
information for reconstructing images close to identical to the human 
observer. The paper presents direct methods for reconstruction of min- 
imal variance representatives, and variational methods for computation 
of maximum entropy and maximum a posteriori representatives based on 
priors for natural images. Finally, conclusions on the information content 
in blobs and edges are indicated. 



1 Introduction 

An image is often perceived as the graph of an intensity function over the spatial 
domain. Not every characteristics of this graph is of interest to the observer. An 
observer only interested in image edges, will consider two images having the same 
edges as being per definition identical, since all observables (edges) are identical. 
This leads us to the conclusion that the most appropriate operational definition 
of image structure is a collection of operationally defined image features. The 
set of image features is not fixed for all observers/tasks, thus two images of 
identical structure for one task may have deviating structure for another task 
(set of features). Since the number of possible different operationally defined 
features is infinite, we may attribute an infinite number of different structures 
to a given image. 

We will only investigate a small subset of all possible operationally defined 
features, namely those that can be expressed solely in terms of local scale-space 
derivatives. Particularly, we address blobs and edges. We adapt the approach 
of Gaussian scale-space |l4lbj so that multi-local features may be expressed as 
features at a finite scale 0 • The features of an image change as a function of the 
free scale parameter, and this total scale-space behavior of features in general 
describes the image graph to a very high degree. As examples, it has been proven 
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that one may reconstruct the image from the multi-scale zero-crossings of the 
Laplacian P| or alone from the scale-space top-points in the analytical case |S|. 

In this paper, we will not investigate the case where the image may be 
uniquely reconstructed from the features. We investigate the case where the 
detected features define a metameric class of images. As a representative of this 
class we chose the simplest image in the class. Simplicity is defined relative to 
an expectation (prior), and we suggest the use of different priors. Among these 
we propose Gaussian intensity distribution, photon distribution entropy, and a 
Brownian motion model for natural images j2|. 

In the case of using the maximum of photon distribution uni our work re- 
semble the work of Zhu, Wu, and Mumford in their constructive image 

processing. They use outcomes of filters for building a stochastic model of a class 
of images or textures. They approximate the distribution of images as the max- 
imum entropy distribution yielding the same feature statistics. In our work, we 
do not look at an ensemble of images, and a distribution of features, but at one 
given set of features detected in an image and the class of images having these 
features. 

The present work may be used for image representation (compression). We 
show that few features are sufficient to describe an image to a very high degree. 
Furthermore, the present work yields an easy way to grasp the intuition of which 
information a feature actually carries about the image. In this way we see the 
major contribution of this paper, not a matter of efficient image coding, but as 
to describe the actual information contents in features and combinations hereof. 
We wish to gain insight into features, their selection, and importance. 

Elder and Zucker ^ presented work on reconstructing images from scale- 
space edges. However, they did not reconstruct images that exhibit the same 
features, but merely use the scale of an edge to indicate the slope of the edge 
in the image and then reconstructed the image as a minimal surface. We wish 
to emphasize that the present image representation scheme is a projection: The 
reconstructed image exhibits observables identical to those measured in the orig- 
inal image. 

In the following we briefly describe the features of choice and the computation 
of the representative of the metameric class. In terms of image coding or image 
representation, we will denote this the reconstruction of the image from the 
features. Finally, we give examples and draw conclusions. 



2 Scale-Space Feature Detection 

The scale-space image L{x, a) is constructed by convolving the original image I 
with a Gaussian of standard deviation sigma, and derivatives hereof by convo- 
lution with scale-normalized Gaussian derivatives so that 

diL{x, a) = a'^dxL{x, a) 

where the cr-term gives the scale normalization and 7 is a free parameter for 
feature-dependent tuning of the scale selection jSE]. Normally 7=1 but for 
some features a better scale selection is obtained by choosing a different 7. This 



What Do Features Tell about Images? 



41 



is related to the local Hausdorff dimension of the image graph close to the feature 

PH- 

Feature detectors can now be created as non-linear combinations of scale- 
space derivatives: 



Feature Strength 


Spatial 


Scale 


7 


Blob 


AL 


maxa;(AL) 


maxo-(AL) 


1 


Edge 


Lw 


Lww — 0 


maxcr(Lu,) 


1/2 


Corner 


r2 T 


ma-Xa; (^L^Lyy ) 


(^L^Lyy^ 


1 


Ridge 


Lyy 


Lyyj — 0 


rnaX(j (^Lyy ) 


3/4 



Here v is along the isophote direction and w along the gradient direction. Max^- 
denotes maximum over scale and max^; denotes a spatial maximum. 

These feature detectors select a number of points of interest or attention 
in the image, and their corresponding scale. In principle, we could choose any 
measurement of the image as representation of the image. However, we believe 
that the abovementioned feature points are special points, and in the following 
we will investigate their information contents. 



3 Selection of a Representative of a Metameric Class 



Assume a set of point features are given. They are each given in terms of a 
number of localized linear filters, and the corresponding filter value. That is, we 
know that the image I{x) satisfy the constraints: 



/ I{x)fi{x)dx = Ci,i = l...K 

Ja 



( 1 ) 



Many images / : IR^ i-^- IR+ may fulfill Eq. the metameric class. We are 
interested in one representative of this class. We select this representative as the 
image minimizing one of the three different complexity measures: 



V = 


/ (I{x) — fi)'^dx 


(2) 




Jn 




H = 


- / ln{x)\0gln{x)dx 


( 3 ) 




JQ 




B = 


[ iVJpdx 
Jn 


( 4 ) 



where 



/ 

J'^I(x)dx 



The intensity variance (E) corresponds to assuming a Gaussian prior of 
intensities. In this way the simplest image is the maximum a posteriori estimate 
of the image given the features. The measure must be minimized to yield this. 
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This measure is very simple, but has the drawback that the complexity is defined 
globally and thereby not takes the concept of homogeneous regions into account 
(see Fig. nj. 

The photon distribution entropy {H) is the stochastic complexity of the 
distribution of the position of a photon hitting the image. That is, the distribu- 
tion corresponds to an image normalized in intensity to unity. A reconstruction 
of maximum entropy is thus as close as possible to the uniform distribution 
(uniform image). This measure nicely takes into account the fact that image 
intensities are positive values. It is a measure of the global variance. Like the 
above variance measure, it does not take homogeneous regions into account. 

The local variation {B) measures how much the image varies locally. The 
reconstruction according to this measure will locally be as homogeneous as pos- 
sible. The reconstruction may be derived as the maximum a posteriori estimate 
according to a prior for natural images. The prior is a Gaussian Brownian motion 
in intensity: 

TT _£l±£« 

P{J) oc e 2^2 

S7 

This has been shown to approximate the distribution of natural images | 2 |. 
However, more refined analysis show that the distribution of the local variation 
in natural images is not exactly Gaussian, and the spatial correlation pattern in 
images in e.g. a forest show a little less correlation than the classical Brownian 
motion model m However, we will use the Brownian motion model since it is 
simple and has the property of non of the above, that homogeneous regions are 
taken into account (see Fig.[IJ. 







Fig. 1. Reconstruction of iconic images (64x64) from the various priors using 
only very sparse information. From left to right are original image, minimum 
variance reconstruction, maximum entropy reconstruction, and Brownian motion 
model reconstruction. 32 edge points are used for the edge image and 9 blobs 
for the blob image. 
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4 Linear Reconstruction 



In this section we show how the variance measure leads to a very simple, direct, 
and linear computation of the reconstruction. 

Using standard techniques of calculus of variation and Lagrange multipliers, 
the distribution satisfying Eq. Q and having minimal variance satisfies 



5J 



K 

/ {J - + '^\ifiJdx 

•In 



i=0 



= 0 , 



where J is the reconstruction and we have identified /o as the unit function over 
17 to ensure that Jdx = cq. Since the variance is a convex functional and the 
constraints are linear, the solution is unique: 



i 



(5) 



The “only” remaining problem is to identify the values of the Lagrange mul- 
tipliers Xi so that J satisfies Eq. ^ In general, this may be a hard problem. 
However, in the case above, everything is linear, and we may construct a simple 
linear solution: 

The reconstructed image must exhibit the same features as the original so 
that 

Ci = j fi ^ ^ Xjf jdx = ^ ^ Xj I fifjdx 
Jn ^ ^ Jn 

Hence we identify 

, Jn 

where the inverse is computed as the matrix inversion of a^- = fifjdx. Finally, 
we find the reconstructed image J as 






( 6 ) 



where one may recognize the terms in front of Ci as the pseudo inverse of a matrix 
F containing columns of values of the filters fi defined on a discrete domain. 
Above, we kept the continuous formulation all way through the reasoning in order 
to demonstrate that the resulting image J is defined on a continuous domain, 
if the filters are too. However, in the experiments below we use the discrete 
formulation and the pseudo inverse for computations: 



J = F{F^F)-'^c 

For examples of actual reconstructions see Fig. 0 



( 7 ) 
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Fig. 2. Minimum variance reconstruction of Lena’s left eye and tree image. From 
left to right: reconstructions on basis of 200 blobs and 200 edge points, original 
images, and reconstructions based on 40 blobs and 40 edge points. 

5 Variational Reconstruction 

Above, we gave a linear closed form solution for reconstructing the image of 
minimum variance. However, in general, we would like to be able to use alterna- 
tive priors. This may be the maximum entropy criterion on photon position or 
on basis of other empirically verified priors on natural images mrm\ . Below we 
outline a variational approach as a standard constrained gradient descend. For 
general image functionals it yields a local maximum. For convex functionals this 
is obviously also the global maximum. 

Assume an energy functional E[J] to optimize under the feature constraints 
(Eq. nj. Assume also a suboptimal representative of the metameric class 
satisfying Eq. ^ In practice we compute this as the minimal variance solution 
above. A gradient descend in energy reads 

dtJ = -lEj 

where Ej = |j. This makes the solution depart from the metameric class. We 
construct an evolution equation constrained to the metameric class by project- 
ing the solution back onto the metameric class. We may then talk about an 
observation-constrained evolution. 

Since the constraints of Eq. ^ are linear we obtain: 

dtJ = -l(Ej[J] - Ej[J]EM) (8) 

where EjEMc is defined as the part of Ej orthogonal to the metameric class: 

Ej[J]EM = V/j( [ fifjdx)~^ [ fiEj[J]dx 
.. Jo Jo 

This is derived as the reconstruction (Eq. 0 based upon the filter measurements 
of the variation Ej. The corresponding discrete formulation (Eq. |7I) reads: 

Ej[J]EM, = F{F^F)-^F^Ej[J] 
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Minimizing Eq. |3 yields the maximum entropy reconstruction. Starting by the 
minimum variance solution (Eq. EJ a variational solution is reached by Eq. El 
where 



Ej[J] = 1 + log J. 



This scheme is guaranteed to converge as the entropy is convex and the con- 
straints linear. 

Likewise, for the local variation functional based on the Brownian motion 
prior: 

Ej[J] = -AJ. 



In this case the complexity minimization is a standard image diffusion process 
creating a Gaussian scale-space. The evolution preserving observations (Eq. 0 
may then be denoted observation-constrained diffusion of images. This functional 
derived from a Brownian motion model of natural images is convex, hence the 
evolution is ceased to converge towards the global optimum. 



6 Experiments 

In this section we give some simple examples to approach the information con- 
tents of edges and blobs. We examine the suitability of the various priors to 
create visually appealing reconstructions from blobs and edges. We examine how 
to represent the features; by their feature strength or by the fact that they are 
features (local maxima of feature strength). We discuss and show experiments 
of how to select features, and finally we show that feature points are better for 
representing images than random points. 

Priors and Features. The lower the probability of an event is, the more in- 
formation it carries. Thus, optimally we choose features representing unlikely 
events with respect to the image prior. Intuitively, the Brownian motion model 
wants to create images that are as smooth as possible. Edges indicate the lo- 
cal “unsmoothness” of the image. This suggests that the interplay of edges and 
the Brownian prior is optimal. Blobs contain information on the dark and light 
regions of the image. This suggests that blobs are well suited for the global 
measures, especially the variance measure. These points are illustrated in Fig.^ 
Feature Represeutatiou. Having selected a feature type for representing an 
image, the question is how to represent or “code” this feature . To our mind, two 
natural choices exist: A feature point may be represented by the fact that it is a 
feature or by the values of the corresponding feature measure. In the latter case, 
there is no guarantee that the reconstruction will exhibit the same feature (have 
a local maximum in feature strength), but it exhibits the same feature strength. 
In the first case there is no guarantee that the feature in the reconstruction will 
have same feature strength. In Fig. 0 we compare the variance of blob and edge 
reconstructions based on different orders of derivatives. The first order image 
structure codes the edge strength and the edge orientation. The second order 
image structure codes either edge presence (plus one directional second order 
derivative) or blob strength (plus the orientation and elongation, see below). 
The third order image structure indicate the presence of a blob (plus two more 
directional third order derivatives). 
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Fig. 3. Information carried by presence and strength of blobs and edges. The 
graph shows the minimal variance for a reconstruction given the selected features. 
The features have been selected ordered by their individual feature strength. 
Topmost line is the theoretical largest variance: the variance of the input image. 
Image (left column, middle) shows a reconstruction based on 80 blobs represented 
by 3rd order structure. Above and below are blob representations using 2nd order 
structure. Above is equally much variance, below is equally many blobs. Right 
columns are reconstruction based on edges. Middle: reconstruction based on 160 
edge points represented by 2nd order structure. Below: same edges but width 
1st order structure. Above: same variance is obtained by 80 edge points using 
1st order structure. 



It is evident from Fig. 0 that blob structure (the total 2nd order structure 
in blobs) carries much more information (variance) than the edge structure. It 
is also evident that the feature strength measures carry more information than 
the feature presence measures. This is due to the lower order of differentiation of 
the information. The second order structure is invariant to addition of an affine 
intensity function, whereas the third order structure is invariant to addition of 
a second order polynomial. 

Feature Type. Intuitively edges and blobs carry different information on im- 
ages. In Fig.2]reconstructions of equal variance are compared. These experiments 
show what one could expect: Edges carry information on the rapid transitions 
whereas blobs carry information on the global structure. 

Blob Representation. The blobs where shown above to carry much informa- 
tion in their full second order structure. Actually, the feature strength measure 
is the Laplacian. In Fig. El reconstructions based on the full structure and only 
the Laplacian are compared. It is evident that the elongation and orientation 
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Fig. 4. Comparison on the structure in edges and blobs. The graph shows the 
variance as function of number of edges ( 1 st order rep.) and blobs ( 2 nd order 
rep.). First column of images shows reconstructions based on blobs (20, 80, 250 
from above). Last column shows reconstructions based on edge points (80, 230 
250 from above). 80 or 230 edge points and 20 blobs contain approximately 
equally much variance. 250 edge points and 80 or 250 blobs contain equally 
much variance. 



of the blobs carried by the full second order structure reveal much information. 
Furthermore, the figure shows that blob presence carries as much information as 
blob strength, but only when many blobs are selected. Few blob strengths carry 
more information than the presence of the same few blobs. 

Selection of Features. Given a set of image observations in terms of measure- 
ments in feature positions, we wish to select a subset of maximum information. 
This is a hard problem. Instead, we look at how much each observation car- 
ries individually, as if it were used alone for reconstruction. In this the minimum 
variance reconstruction J is Xf = F"^{FF^)~^c, and thereby (zero mean images) 

Var[J] = J^J = c(FF'^)-^'^ FF'^(FF'^)-^c = c{FF'^)~^c 

This is for simplicity given in the discrete notation, but the result also holds 
in the continuous formulation. Indicated below is the information of a scale- 
normalized (7 = 1 ) derivative of value c at scale tr: 
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Fig. 5. The graph shows the variance as a function of number of blobs for 3 dif- 
ferent blob representations: full 2nd order structure, spatial and scale derivatives 
of feature strength ((AL)a,, {AL)y, (AL)t), and feature strength (AL). To the 
right are corresponding images for 80 blobs from above and left to right below. 



The information is dependent on the derivation but independent of scale. This 
shows that as long as features do not overlap they should be chosen directly 
from their scale-normalized feature strength. It should be noticed however, that 
edges used scale-normalized derivatives with respect to \/a and thereby actually 
carries information proportional to c^u for first order and c^cr^ for second order. 
We propose a greedy algorithm for selection of features, choosing feature in 
descend filter value. This algorithm does not take into account the interplay of 
features, and is thereby not optimal, but extremely simple as all features may be 
judged independently. This strategy was used in all of the above experiments. 

In Fig. 0 we compare different strategies for selection of features. Firstly, 
we choose the strategy mentioned above, selection based on largest feature 
strength first. Secondly, we choose a strategy of randomly selecting blobs. Fi- 
nally, we select random points in scale-space. The random points are uniformly 
distributed over scale-space taking the natural scale-space metric into account 
{Ax oc (T, Ay oc cr, Aa oc a). We conclude from the figure, that blobs selected by 
feature strength carry more information than randomly selected blobs, and that 
blobs carry more information than blob strength in random points. 



7 Discussion 

Based on the experiments above, we make the following comments on how much 
information different types of features and their representation carry: 

- Images may be reconstructed to a very high degree of visual precision based 
upon edge strength and blob strength in a number of detected feature points. 

- Using only blob strength does not carry sufficient information for visually 
pleasing reconstructions. 

- Taking the total blob structure (all 3 second derivatives) into account yields 
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Fig. 6. The graph shows the variance of the reconstruction based on 2nd order 
structure as a function of number of points for three different point selection 
strategies: Blobs by feature strength (top row, 20 and 80 blobs), random blobs 
(middle row, 80 and 160 blobs), and random points (80 and 160 points). Selection 
by strength of 20 blobs has same variance as random selection of 80 blobs and 
random selection of 160 points. 



potentially much more visually appealing reconstructions, but still not all infor- 
mation in the image has been collected. 

- Edge strength and orientation alone carries more information about images 
than blob strength, but not as much as the total blob structure in blobs. 

- As it was also predicted theoretically by looking at independent features, fea- 
ture strength carries more information than feature presence. 

- In order to create truly visually appealing reconstructions, blob strength and 
edge strength is not sufficient independently, they are both needed. 

Tony Lindeberg’s scheme of selecting features based upon feature strength 0 
computed as normalized scale-space derivatives corresponds to selecting points 
of maximal information under the minimal variance prior. We see this as an 
underpinning of his conceptually simple but effective scale selection mechanism. 

We have seen that the choice of prior is important when only very sparse 
information on the image has been sampled. However, when information suffi- 
cient for creating reconstructions of high visual accuracy has been sampled , the 
change of prior does not visually change the reconstruction. This is very natural. 
One way to put this is: For very little information sampled, the metameric class 
is large, and the different priors may choose representatives far from each other. 
When much information has been collected, the metameric class is small, and 
any sensible prior will essentially point to the same image. 
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The conclusion (that feature strength carries sufficient information to visually 
reproduce an image) shows, that feature detection is not necessarily a matter 
of information reduction, but merely a matter of changing into a representation 
that may easier subserve computation of visual tasks. Is only feature position 
taken into account, we see a true reduction of information. As long as this is the 
relevant information which has been extracted, feature detection communicating 
only the position of features is then a sorting of information. 
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Abstract. We present theoretical and computational resnlts that de- 
velop Koenderink’s theory of feature analysis in hnman vision llltl . Em- 
ploying a scale space framework, the method aims to classify image points 
into one of a limited number of feature categories on the basis of local 
derivative measurements up to some order. At the heart of the method is 
the use of a family of functions, members of which can be nsed to account 
for any set of image measnrements. We will show how certain families of 
simple fnnctions naturally induce a categorical structure onto the space 
of possible measurements. We present two snch families suitable for ID 
images measured up to 2"“^ order, and various results relevant to similar 
analysis of 2D images. 



1 Introduction 



This paper is concerned with the analysis of models of feature detection in 2D 
within the framework of scale space theory [11217112) . 

The visual system has the task of measuring the retinal illuminance distri- 
bution. Since this is a continuous physical function, the space of possible illu- 
minance distributions is infinite dimensional. The visual system, however, can 
only perform a finite number of measurements. And so, in contrast, the space of 
possible measurements is only finite dimensional. 

The measurements that the visual system performs are physical operations 
which thus have certain limitations upon them; in particular, they cannot sample 
the illuminance at a zero-sized point of the visual field, but only over some 
aperture of non-zero size (or scale) . This has been formalized in the idea of scale 
space pim) . 

Scale space allows many types of analysis: within scale, between scale and 
so on. Here we are concerned with “local analysis”, i.e. at a single scale and 
location. This is often referred to as feature analysis. 

Koenderink’s insight into the problem of feature analysis is that the concept 
of metamerism is key PQ Metamerism is a phenomenon common to many 
measurement systems - certainly imaging systems - whereby a finite number of 
measurements is insufficient to completely determine the physical reality, and 
thus there can be distinct physical inputs which are observationally equivalent. 



M. Kerckhove (Ed.): Scale-Space 2001, LNCS 2106, pp. 51-^^ 2001. 
© Springer- Verlag and lEEE/CS 2001 
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In the remainder of the introduction we further review the background con- 
cepts of scale space and metamerism. In the body of the paper we will introduce 
our approach by presenting results on feature detection in ID images and then 
report progress on understanding feature detection in 2D images. 



1.1 Scale Space 



Arguments have been made 1 1 i/I 1 2j that the visual system should sample retinal 
irradiance with Gaussian apertures of a range of scales. This leads to a scale space 
representation of a signal I{x,y) by / : x R+ ^ R, where the parameter 

t G R'*' describes the current level of scale 



II g Sill IK [Klim 



= ( 1 ) 

and 

^t{x,y) = ( 2 ) 

This formalization also provides a solution to the problem of how to differentiate 
physical functions rmsi : convolution with a scaled Gaussian derivative kernel 
does both observation and differentiation in a single step, as it is shown from 
the following relationship: 



(-l)^d^Gt *I = Gt* a”/. (3) 

We write Omn for the observation that results from measuring the g^mgyn deriva- 
tive at a point (2D example). Thus, 

Omn = * I (4) 

Hence observations {omn) are the numbers that result from performing mea- 
surements by application of operators to illuminance distributions. For 

example, Oqq will be the observation resulting from measurement with the un- 
differentiated isotropic Gaussian kernel. 



1.2 Truncated Taylor Series 

While the primate visual system may use operators up to the or to the 
order US], in contemporary image processing one typically uses operators up to 
the 2"'^ order, producing the six observations o^, where 0 < i -h j < 2. Such 
an ordered set of observations resulting from measurement with operators up to 
some order we will refer to as an observation vector. The dimensionality of the 
image space will thus be a function of the maximum order of measurement. For 
ID images, we will have the (n-|- 1) — tuple (oq, Oi, ...o„) and for 2D images the 
i (n -I- l)(n -I- 2) - tuple (ooo, Oio, ...oo„). 




Features in Scale Space: Progress on the 2D 2""^ Order Jet 



53 



1.3 Jets and Metamery Classes 

In standard mathematical usage the order jet is defined to be the equiva- 
lence class of functions that have the same n — truncated Taylor expansion at a 
given point IIB|,i.e. when measured produce the same observation vector. Thus 
observation vectors uniquely index jet^, and so while different observation vec- 
tors are necessarily due to different distributions, there are distinct illuminance 
distributions which do correspond to one and the same observation vector. Such 
indistinguishable pairs of patterns are referred to as “metamers” , which is ana- 
logous to colour vision m, where distinct spectra of light can give rise to the 
same colour sensation. 

The phenomenon of metamerism is pervasive m and important, not least 
because by characterizing what aspects of stimuli a measurement is blind to one 
better understands what is actually being measured. 



1.4 Representative Functions 

The problem now is to characterize these equivalence classes (the jets) of the 
visual system and to classify them into distinct “feature types”, as categories 
are not obvious in space of observation vectors. Koenderink m suggests to look 
at functions associated with the observation vectors to find categories there. The 
idea is to select a function from each jet to stand for the jet. For this strategy 
to succeed one needs a selection rule that is guaranteed to pick one and only 
one function from each jet. Koenderink builds upon work by Schrodinger ini 
on colour vision to outline an elegant solution to this problem. The solution 
involves selecting a representative function that takes on only two values and 
has a transition locus between the two values that is particularly simple. 

Koenderink then goes on to suggest that these particular representative func- 
tions can be subdivided into qualitatively distinct (“feature”) classes. While it 
is clear how to do this for ID images and for 2D images measured up to 
order, it is less clear how to proceed for higher dimensions and orders. 



1.5 Constraints on the Illuminance 

If there are constraints on the measured illuminance distribution, then they 
may result in constraints on the observations. For example, if illuminance is 
constrained to be non-negative, then the observation Oqq will always be non- 
negative. One can use constraints to guide the selection of the function that will 
be the representative of the jet. 

Koenderink m uses upper- and lower-bound limits on illuminance, which 
he motivates by physical considerations. We, in contrast, do not enforce any such 
constraints as we regard representative functions not as candidate explanations, 
but merely as indices of jets. This decision arose from our study of the ID - 2"^^ 

^ Koenderink uses the following terminology: “picture” for physical irradiance distri- 
bution, “image” for observation vector, and “icon” for jet. 
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order jet case; we noticed that if we consider a limited illuminance range then the 
model found surprising metamers of the signal - especially near certain features 
such as extrema, which were narrow relative to the scale of measurement. When 
we do not enforce these constraints, we observe that metamers match the local 
contrast better. 



2 Notes on ID, 2 ^^ Order Jet 

In this section, we report on our analysis of feature detection in ID images 
before discussion of the more difficult case of 2D images. In ID (2"'^ order jet), 
an observation vector is the triple ( 00 , 01 , 02 ). 

For any observation vector, we consider the jet consistent with it; with the in- 
tention of choosing a representative function from the jet. For this to be possible 
unambiguosly, we need to identify an entire family of representative functions 
with the property that one and only one member of the family is contained in 
each possible jet. This will only be possible if the family has the same dimen- 
sionality as the observation vectors. So for the ID, 2"^* order jet considered here, 
the family needs to be three-dimensional. 

Koenderink proposed several alternative families of representative functions 
PH In all of his suggested solutions, the members of the families are func- 
tions that attain at most two values and have at most two locations were they 
change discontinuously between the two values. The locations of these transi- 
tions provide two of the requisite three degrees of freedom of the family. The 
remaining degree of freedom comes from choosing the values that the functions 
attain. Despite there being two values to choose, only one degree of freedom 
comes from this choice because in Koenderink’s solutions either (i) one of the 
values is forced to be zero, (ii) one of the values is forced to be illuminancemax , 
or (iii) the sum of the two values is forced to be illuminancemaa; ■ 

We have implemented these solutions and found them to produce unexpected 
representative functions for observation vectors with 0*^ order terms near 0 or 
illuminancernaa; ■ The unexpected representative functions were of much greater 
contrast than the local structure that they were accounting for. We trace the 
problem to the use of illuminance constraints, and so have experimented with 
two alternative families of solutions without such constraints. We find that the 
representative functions obtained with these alternative families are always well 
matched in contrast to the local structures that they explain. The two solutions 
we have experimented with are: 

• Only one transition point (1 d.o.f), but both values of the function uncon- 
strained (2 d.o.f.). 

• Two transition points but with a constraining relationship between them (1 
d.o.f.) and both values of the function unconstrained (2 d.o.f.). 

We will develop these two models in the following two subsections. 



Features in Scale Space: Progress on the 2D 2""^ Order Jet 



55 



2.1 Edge Representation 

Our first model employs as representative functions a family of step functions. 
The family has three degrees of freedom, one degree of freedom (^) specifies the 
location of the step while the other two degrees of freedom (l,r) specify the 
values of the function either side of the step. Hence a member of the family is 
given by = I + \{r — 1) sgn{x — ^). The order observation vector 

resulting from measurement of is: 




dx 



/R 



Gt{x)fi,i,r{x)dx , / G”{x)f^^i^r{x)dx 



/R 



( 5 ) 



To apply this model to the analysis of actual data, one first measures the data 
at some location to produce a particular observation vector and then one finds 
I, r so that eqn. 0 equals the measured observation vector. Although such a 
triple ^,l,r is guaranteed to exist and be unique, finding them is non-trivial as 
we lack a closed form expression for them in terms of the observation vector; 
instead we use a gradient descent routine. Results are presented in FigH 



2.2 Pull Bar and Gap Representation 

In the previous section we noted that using Koenderink’s suggested families of 
representative functions sometimes resulted in local approximations with sur- 
prisingly high contrast. Although we (in the main) cured this problem by using 
an alternative family of representative functions, we found the observation sug- 
gestive and wondered whether we could find even lower contrast approximations 
with a different family of functions. 

We have achieved this goal by considering functions with at most two transi- 
tions and two values. This gives a four d.o.f. family, one more than is needed or 
acceptable. This extra d.o.f. means that for any given observation vector, there 
will be a one parameter family of functions that when measured produce the 
observation vector. One can then ask the question, of this one parameter family, 
which function is of the lowest contrast uu, i.e. which has its two values closest 
together? This problem is easily solved using the method of Lagrange multipliers, 
and produces a pleasingly simple result: if the two transition points of the lowest 
contrast solution are at a; = a and x = fd, then afd = —2t (where t is the scale of 
the measurements producing the observation vector). This condition is already 

^ A standard definition of contrast is C = {Imax — I min) / {I max + Imin) (Michelson 
contrast), where Imax and Imin are the limits of the intensity in the image . Instead, 
we use the simple definition C = Imax — Imin- 
® As Koenderink has observed m, there are many criteria one could choose. We make 
no claim, at this stage, for any special advantages to the minimum contrast condition. 
Further work considering the effects of different conditions will be necessary to settle 
this. 
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Fig. 1. ID (2nd Order Jet): Edge Representation. In the top row is shown the 
original ID signal at a particular scale (a) and its scale space representation (b) . 
In (c) we have the representation that follows from the application of the model 
to the signal in (a). The algorithm approximates the signal at each point by 
using a right or left step edge, or a uniform function as indicated by the varying 
greylevel of the thick curve. The strongly marked curve segments are the locus of 
the step edge height midpoints and so give some clue as to where the step edges 
are located. Uniform functions have been chosen as the local model whenever 
the transition of the step edge found by the algorithm is too distant from the 
point being analysed. If uniform functions are not used and instead all points 
are modelled as left or right step edges, the characteristic ’tick’ formation at the 
end of many of the strongly marked curves would be even more pronounced. In 
(d) the same approximation is applied across scale. 



familiar to us from Koenderink’s work on colour and spatial vision; in particular 
Ostwald’s theory of semi-chromes |S|. Following Koenderink’s nomenclature, we 
will refer to the condition a/3 = — 2t as the “full pattern” conditiorfl. 

Having understood how (at least for the ID, 2"*^ order jet) the “minimum con- 
trast” criterion leads to the “full pattern” condition we can now propose a novel 

^ Again some terms come from the analogy with color theory^^ ; in this case “full pat- 
terns” remind Schrodinger “full colors”, i.e. the one with the maximum of monochro- 
matic content. 
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three-dimensional family of functions suitable for indexing the ID, 2"^ order jet. 
Such functions attain at most two unconstrained values (two d.o.f.) and they 
have at most two points of transition that must satisfy the full condition (one 
d.o.f.). We illustrate the use of this family of functions in Fig|3 




Fig. 2. ID (2nd Order Jet): Full Bar and Gap Representation. Shows the ID 
2nd order technique of section 12.211 applied to a 2D image. We apply it by 
measuring the 0*^, 1st and 2nd derivatives only in the gradient direction. These 
three numbers then form the input to our ID analysis. The original image is 
shown in (a). The other three images (b, c, d) show results of the application 
of the algorithm to a selected point. The radius of the circle, that encloses the 
neighborhood of the selected point, is proportional to the square root of the 
measurement scale (t = 4) at which the analysis is performed. The signal is 
approximated by a gap (b) or a bar (d). Although the approximating function 
in (c) is a bar, it appears to be an edge because the second transition falls outside 
of the circle. 



3 Notes on 2D, 2”*^ Order Jet 

We now consider the case of 2D images (2"^^ order jet), for which an observation 
vector is six-dimensional: (oqo, Oiq, oqi, O 20 , On, O 02 ). 

Koenderink has suggested P that adequate candidates to be representative 
functions are the binary-valued functions with a transition locus that is a conic 
curve. This is an attractive suggestion as it produces categorical feature classes 
as desired. Each conic type corresponds to a different feature type: ellipses to 
blobs, parabolas to corners and hyperbolas to necks or gaps. 
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The family of such functions has seven degrees of freedom, five coming from 
the conic transition locus and two from the function values. The observation 
space is six-dimensional, so we have an extra degree of freedom. As we did in 
ID, we can use a minimum contrast condition to eliminate the extra degree of 
freedom. 

In ID (2"'^ order jet), we found that the minimum contrast condition led 
to the “full pattern” condition. Our guess is that something similar is true for 
conics and we explore this in the following subsections. 



3.1 Relatedness of “Complementary” - “Scale Relation” - “Pull 
Patterns” Conditions 

In the case of the ID, 2"^^ order jet, we found that the “minimum contrast” 
condition is equivalent to a “full pattern” condition. We want to clarify what 
this “full pattern” condition means in 2D; in order to do this, we start by defin- 
ing a “complementary” condition , we then introduce the concept of a “scale 
relationship” and finally we show that all these conditions are related. 

These concepts we are going to introduce apply either to set of points or to 
algebraic curves; we want to define them and study their inter-relationships and 
equivalence. We begin with the definition of “complementary” condition. 

Let us introduce a vector of Gaussian derivatives up to the 2"^^ ordei|3 



9t{x,y) 



Gf"^\x,y) 

Gf'°\x,y) 

G^t'^\x,y) 



( 6 ) 



Complementary Condition. A set of 5 point^Pi = (cci, yi) (in the domain of 
the image) is complementary if and only if one can put weighted delta functions 
at those points that when measured give the achromatic axis, a vector with zero 
entries in all the slots but the first one, corresponding to the 0*^ order elemenlQ; 
i.e. 3wi such that: 



5 

5^«;.o[5(PO] = 0 (7) 

i=l 

® We do not consider the element corresponding to the 0*^ order. 

® In an Q- dimensional observation system (1? = 6 for the case of 2D, 2”"^ order jet), 
we consider a set of (i? — 1) points. 

^ In our case it is the 5-dimensional null vector, as we do not consider the 0*^ order. 
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where o[i5(Pi)] = gt(Pi). Calling w the vector of weights and M the matrix with 
o[(5(Pi)] as columns, we can rewrite eqn.(|3) as: 



M ■ w = 0 (8) 

which admits solutions if and only if \M\ = 0. Using elementary properties 
of determinants together with the fact that Gaussian derivatives are Hermite 
polynomials multiplied by the original Gaussian, the complementary condition 
expressed in (0 becomes: 



-Xi 


-X2 


-2^3 


—X 4 


-X5 


-yi 


-2/2 


-2/3 


-2/4 


-2/5 
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2t ^ 


^2 1 

2t ^ 


_ 1 
2t ^ 


^4 1 

2t ^ 


_ 1 

2t ^ 


xiyi 
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X4V4 


X5V5 
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2t 


2t 
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yl_i 

2t ^ 


vl 1 

2t ^ 


— 1 
2t ^ 


vl 1 

2t ^ 


^-1 
2t ^ 



On the other hand, five points uniquely identify a conic; a good question is if 
there is anything special about the conic corresponding to points in complemen- 
tary position. Firstly, we have found that if there exists five points of a conic 
which are in complementary position, then all other five points of that conic are 
in complementary position. That is what we call a “complementary” conic. A 
conic is described also by five parameters, and we have found that the parameters 
of a “complementary” conic satisfy a specific condition, the “scale relationship”, 
which we are going to define in the next subsection. 

Scale Relationship. Let us consider a conic x of the form ooo + aiox + aoiy + 
o, 2 ox^ + OLiixy + ao 2 y^ = 0. If it is a “complementary conic”, it is easy to shovll 
that the eqn. o referred to any five points of x is equivalent to the following 
equation for the conic parameters: 

2t(o2o + 002 ) + 0-00 = 0 (10) 

which we call the “scale relationship”. In the case of ID, 2"'^ order jet it corre- 
sponds to a/3 = —2t (see sect. (12. 21) : but there we used transition point coordi- 
nates instead of curve parameters). 

We want to show the equation (nn) is also true for “full patterns”, a term 
coined by Koenderink PE! in analogy with a concept from color science. 

® We just sketch the proof here; considering properties of determinants, we multiply 
each row by the suitable conic parameter (different from zero!) and then we replace 
one row by the linear combination of all the others. This turns out to be the equation 
of the conic, which is zero for all the points, plus the extra terms: 2t(u20 + 0:02) + aoo, 
which are the same in all the columns. So, if this last part is zero we have an entire 
row of zeros, and the determinant will be zero. 
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Full Pattern Condition. We have been considering binary-valued functions 
with conic transitions. When measured, these produce an observation vector (o). 
As the conic parameters are varied while holding the two function values con- 
stant, the observation vector varies, sweeping out a 5-dimensional submanifold. 

A conic is full when the tangent space to the submanifold contains the achro- 
matic axis, i.e. 3wi such that: 



^ do 



( 11 ) 



where the are conic parameter^. If we introduce a matrix T, the columns of 
which are the derivatives of the observation vectoi0 with respect to each conic 
parameter, we can rewrite eqn. O a 



T w = Q (12) 

and this is true if and only if |T| = 0. If we notice the similarity between 
derivatives of conics with respect to the parameters and Hermite polynomials in 
Gaussian derivatives, we can then follow a procedure similar to what we have 
done for the complementary condition (the difference is that we have still double 
integrals as matrix elements). We obtain that also |T| = 0 is equivalent to the 
“scale relationship” . 

We have found that both the “complementary” condition and the “full pat- 
tern” condition are equivalent to eqn. m>- We will use this fact in the next 
section, where we are going to show that minimum contrast - amongst conic 
patterns - leads to full patterns. 

3.2 Minimum Contrast and Pull Patterns 

Let X be the conic function and r, s be the two values of the binary- valued 
function we will select as representative of the jet; ignoring the 0*^ order term 
the observation vector of our model is defined as: 

o[x{x, y)] = {s-r) J J ^ gt{x, y) ■ A[x{x, y)]dxdy (13) 

and it has to be equal to the five-dimensional vector of measurements m. Now we 
want to impose the minimum contrast condition; we use the method of lagrange 
multipliers, which gives the following system of equations: 

- sf + \-m) = 0 j = l,2,...,7 (14) 

® i = 10,01,20, 11,02 

The derivatives of the observation vector respect to conic parameters factor out conic 
variables and delta function. For example, ^ P)] dxdy 

^ = //r 2 c- gf(x,y)dxdy and c = (x,y,x^,xy,y^). 
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where aj = r,s, aio, ao 2 - If we focus on the case aj ^ r,s and we replace the 
measurement vector with the observation vector, we can reduce the system da 
to: 

A-^ = 0 Vi, t = 10, 01,..., 02 (15) 

oa^ 

But they are the same as the “full” equations m, being in fact the five ’oi’ the 
parameters of the conic; it follows also they are equivalent to the “complementary” 
condition and to the “scale relationship” . 

4 Conclusions and Discussions 

In this paper, we presented methods for feature analysis of the 2”'^ order jet, 
either for ID or 2D images. Using a scale space framework, we considered deriva- 
tive measurements up to a certain order at any given point of an image (and at a 
particular scale) . We aimed to classify on the basis of such measurements image 
points into one of a small number of feature categories. From Koenderink’s the- 
ory, based on an analogy with colour vision m, the key concept is to identify a 
family of functions, elements of which are natural candidates to account for any 
set of image measurements. These functions are characterized by being binary- 
valued with constrained transition loci between the two values, the constraints 
are such that in ID the functions have at most two points of transition; in 2D 
the transition locus is defined by a conic. As a criterion is needed to select a 
representative among the family of functions, we chose the “minimum contrast” 
condition, for both the ID and the 2D case. We proved that the solutions picked 
out by it also satisfy the “full pattern” condition and the “scale relationship”. 
In ID this is expressed by the fact that the product of the values of the tran- 
sition points is equal to — 2t, where t is the measurement scale. This allows us 
to implement in a simple way our feature analysis, and the results are shown in 
Fig.(0. In 2D the scale relationship is 2t{a2o + ^ 02 ) + ooo = 0, a simple relation 
between the conic parameters. 
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Abstract. This paper presents an approach for simultaneous tracking 
and recognition of hierarchical object representations in terms of multi- 
scale image features. A scale-invariant dissimilarity measure is proposed 
for comparing scale-space features at different positions and scales. Based 
on this measure, the likelihood of hierarchical, parameterized models 
can be evaluated in such a way that maximization of the measure over 
different models and their parameters allows for both model selection and 
parameter estimation. Then, within the framework of particle hltering, 
we consider the area of hand gesture analysis, and present a method for 
simultaneous tracking and recognition of hand models under variations 
in the position, orientation, size and posture of the hand. In this way, 
qualitative hand states and quantitative hand motions can be captured, 
and be used for controlling different types of computerised equipment. 



1 Introduction 

When representing real-world objects, an important constraint originates from 
the fact that different types of image features will usually be visible depending 
on the scale of observation. Thus, when building object models for recognition, it 
is natural to consider hierarchical object models that explicitly encode features 
at different scales as well as hierarchical relations over scales between these. 

The purpose of this paper is to address the problem of how to evaluate 
such hierarchical object models with respect to image data. Specifically, we will 
be concerned with graph-like and qualitative image representations in terms of 
multi-scale image features (Crowley and Sanderson 1987, Lindeberg 1993, Pizer 
et al. 1994, Triesch and von der Malsburg 1996, Shokoufandeh et al. 1999, Bret- 
zner and Lindeberg 1999), which are expressed within a context of feature de- 
tection with automatic scale selection. A dissimilarity measure will be proposed 

* The support from the Swedish Research Council for Engineering Sciences, TFR, 
the Royal Swedish Academy of Sciences and the Knut and Alice Wallenberg Foun- 
dation is gratefully acknowledged. We also thank Lars Bretzner for many valuable 
suggestions concerning this work and for his help in setting up the experiments. 
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for comparing such model features to image data, and we will use this measure 
for evaluating the likelihood of object models. 

Then, within the paradigm of stochastic particle filtering (Isard and Blake 
1996, Black and Jepson 1998, MacCormick and Isard 2000), we will show how 
this approach allows us to simultaneously align, track and recognise hand models 
in multiple states. The approach will be applied to hand gesture analysis, and we 
will demonstrate how a combination of qualitative hand states and quantitative 
hand motions captured in this way allows us to control computerised equipment. 

2 Hand Model and Image Features 

Given an image of a hand, we can expect to detect a blob feature at a coarse scale 
corresponding to the palm, while fingers and finger tips may appear as ridge and 
blob features, respectively, at finer scales. Here, we follow the approach of feature 
detection with automatic scale selection (Lindeberg 1998), and detect image 
features from local extrema over scales of normalized differential invariants. 



2.1 Detection of Image Features 

Given an image / with scale-space representations L(-; t) = g(-; t) * /(•), con- 
structed by convolution with Gaussian kernels g(-; t) with variance t, a scale- 
space maximum of a normalized differential entity 'DnormL is a point (cc; t) where 
'DnormL{x] t) assumes a local maximum with respect to space x and scale t. To 
detect multi-scale blobs, we search for points (a;; t) that are local maxima in 
scale-space of the normalized squared Laplacian 

B^.normL = (t Lf = + dyyhf ( 1 ) 



while multi-scale ridges are detected as scale-space extrema of the following 
normalized measure of ridge strength 

TZ^-normL = - Oyyhf + ^{d^yLf), (2) 



where 7 = 3/4. Each feature detected at a point (a;, t) in scale-space indicates 
the presence of a corresponding image structure at position x having size t. 
To represent the spatial extent of such image structures, we evaluate a second 
moment matrix in the neighborhood of (x; t) 



V — 




( {d^Lf {d,L){dyL) 
\{d,L){dyL) {dyLf 



giv, Sint) dr] 



(3) 



at integration scale Sint proportional to the scale of detected features. Graphi- 
cally, this image descriptor is then represented by an ellipse centered at x and 
with covariance matrix S = tvnorm, where Vnorm = vjXmin and Xmin is the 
smallest eigenvalue of 1 ^. Figures IHa)-(b) show such descriptors obtained from 
an image of a hand. 

An extension of this approach to colour feature detection is presented in 
(Sjobergh and Lindeberg 2001). 
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(a) (b) (c) 



Fig. 1. Blob and ridge features for a hand: (a) circles and ellipses corresponding to 
the significant blob and ridge features extracted from an image of a hand; (b) selected 
image features corresponding to the palm, the fingers and the finger tips of a hand; 
(c) a mixture of Gaussian kernels associated with the blob and ridge features, which 
illustrate how the selected image features capture the essential structure of a hand. 



2.2 Hierarchical and Graph-Like Hand Models 

One idea that we shall explore here is to consider relations in space and over 
scales between such image features as an important cue for recognition. To model 
such relations, we shall consider graph-like object representations, where the ver- 
tices in the graph correspond to features and the edges define relations between 
different features. This approach continues along the works by (Crowley and 
Sanderson 1987) who extracted peaks from a Laplacian pyramid of an image 
and linked them into a tree structure with respect to resolution, (Lindeberg 
1993) who constructed a scale-space primal sketch with an explicit encoding of 
blob-like structures in scale-space as well as the relations between these, (Tri- 
esch and von der Malsburg 1996) who used elastic graphs to represent hands in 
different postures with local jets of Gabor filters computed at each vertex, (Shok- 
oufandeh et al. 1999) who detected maxima in a multi-scale wavelet transform, 
as well as (Bretzner and Lindeberg 1999), who computed multi-scale blob and 
ridge features and defined explicit qualitative relations between these features. 

Specifically, we will make use of quantitative relations between features to 
define hierarchical, probabilistic models of objects in different states. For a hand, 
the feature hierarchy will contain three levels of detail; a blob corresponding to 
a palm at the top level, ridges corresponding to the fingers at the intermediate 
level and blobs corresponding to the finger-tips at the bottom level (see figure | 2 |). 
While a more general approach for modelling the internal state of a hand con- 
sists of modelling the probability distribution of the parameters over all object 
features, we will here simplify this task by approximating the relative scales 
between all features by constant ratios and by fixing the relative positions be- 
tween the ridges corresponding to the fingers and the blobs corresponding to the 
finger-tips. Thus, we model the global position (x, y) of the hand, its overall size 
s and orientation a. Moreover, we have a state parameter ^ = 1 . . . 5 describing 
the number of open fingers present in the hand posture (see figure |2t)) . In this 
way, a hand model can be parameterised by X, where X = {x, y, s, a, 1). 
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Fig. 2. Model of a hand in different states: (a) hierarchical configuration of model 
features and their relations; (b) model states corresponding to different hand postures. 



3 Evaluation of Object Model 

To recognize and track hands in images, we will use a maximum-likelihood esti- 
mate and search for the model hypothesis Xq that given an image I maximizes 
the likelihood p{I\Xq). There are several ways to define such a likelihood. One 
approach could be to relate the model features directly to local image patches. 
Here, we will measure the dissimilarity between the features in the model and 
the features extracted from image data. 

3.1 Dissimilarity between Two Features 

Consider an image feature F (either a blob or a ridge), defined in terms of a 
position p and a covariance matrix X according to section U. 1 1 The dissimilarity 
between two such features must take into account the difference in their posi- 
tion, size, orientation and anisotropy. To measure the joint dissimilarity of these 
features, we propose to model each such image feature by a two-dimensional 
Gaussian function having the same mean and covariance as the original feature 

g{x, p, E) = h{E) g{x, p, E), = (4) 

and compute the integrated square difference between two such representations 

(t>{Fi,F2)=[ {g{x,fj,i,Ei) - g{x,p2,F2)f dx (5) 

given a normalising factor h(E), which will be determined later so as to give 
a scale-invariant dissimilarity measure. The choice of a Gaussian function is 
natural here, since it is the function that minimizes the entropy of a random 
variable given its mean and covariance. The Gaussian function at each image 
point can also be thought of as measuring the contribution of this point to the 
image feature. Figure ^c) illustrates features of a hand represented in this way. 

Using the fact that the product of two Gaussian functions is another am- 
plified Gaussian function with covariance E = {E^^ -|- and mean jl = 
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-I- S2 ^M2)> the integral in 0 can be evaluated in closed form: 

h2(ri) h2(Z'2) h{Si)h{S2)^ det{S2^) 

<i)( r 1 , i^2 ) — , = H , = — C , = 

4^\/det(^i) 47Tv'det(r2) tt 

( 6 ) 

where 



C = exp Vi + M2^2 V2 - (Mi^i ^ + /t'2^2 ^)A)) 



To be useful in practice, 4> should be invariant to the joint translations, rotations 
and size variations of both features. From o, it can be seen that (f>{Fi,F 2 ) will 
be scale-invariant if and only if we choose h{E) = -ydet(I7). Thus, we obtain 



^{Fi,F2) 




^det(rj-i)det(r2”^) 

TTy^ det(T'f ^ -I- ^2^) 



( 7 ) 



It is easy to prove that the dissimilarity measure <f> in o is invariant to joint 
rescalings of both features, i.e. (j){Fi,F 2 ) = (j){Fi,F 2 ), where F{^,E) = 
F{k^, E) for some scaling factor k. Moreover, <f> is invariant to simultane- 
ous translations and rotations of both features. As illustrated in figure 0 the 
dissimilarity measure (j) assumes its minimum value zero only when the features 
are equal, while its value increases when the features start to deviate in position, 
size or shape. 




Fig. 3. Two model features (solid ellipses) and two data features (dashed ellipses) in 
(a) are compared by evaluating the square difference of associated Gaussian functions. 
While the overlapping model (A) and the data (B) features cancel each other, the 
mismatched features (C and D) increase the square difference in (b). 



3.2 Dissimilarity of Model and Data Features 

Given two sets with model and N‘^ data features respectively, we 

consider the model and the data as two mixtures of Gaussian distributions 

AT"* Ar<i 

G'" = ^ 5(x, FT), G" = ^ g{x, nt Ef), 
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where g{x, /r™, and g{x, gf, Ef) are normalized Gaussian functions associ- 
ated with model and data features as defined in (0 . In analogy with the dissim- 
ilarity between two features, we define the dissimilarity between the model and 
the data by integrating the square difference of their associated functions: 

[ {G^-G^fdx. (8) 

By expanding (|S|) we get 

Af”* Af™ . N‘^ N'^ ^ N'^ . 

= E E / 9T9T + 9t9j dx-2YY 9T9j dx 

Qi Q 2 Qi 

whose computation requires comparisons of all feature pairs. We can note, how- 
ever, that overlap between the features within a model will be rare, as will 
overlaps be between features in the data. Therefore, we do the approximations 



jqm. jqa jqm 

Qi^YI (gD^dx, Q2«E/ (atfdx, Q3«2E/ 9T9idx, (9) 

, JR2 “ 7r2 “ 7r2 



where gf. corresponds to the data feature F^. closest to the model feature 
with regard to the dissimilarity measure (j). In summary, we approximate by 

N”' J 

Ajd i\Tm 

<P{F^,F^) « , (10) 

i=l 



where (f> is the dissimilarity measure between a couple of features according to 
(Cl, T)"*, i = 1..IV"* are the features of the model and , i = 1..7V"* are the 
data features, where F^, matches best with F™ among the other data features. 

The dissimilarity measure <P characterizes the deviation between model and 
data features. It is dual in the sense that it considers the distance from model 
features to data features {offset criterion) as well as the distance from data fea- 
tures to model features {outlier criterion). The simultaneous optimization with 
respect to these two criteria is important for locating an object and recognizing 
it among the others. To illustrate this, consider the matching of a hand model 
in states with one, two and three open fingers I = 1,2,3 (see figure db)) to 
an image of a hand as shown in figure da). If we match according to an offset 
criterion only, hypotheses with one and two open fingers (Z = 1, 2) will have the 
same fitting error as a hypothesis with three open fingers {I = 3). Thus, the 
offset criterion alone is not sufficient for the correct selection of a hand state. To 
solve the problem, we must require the best hypothesis to also explain as much 
of the data as possible by minimizing the number of mismatched data features 
(outlier criterion) . This will result in a hypothesis that best fits and explains the 
data, i.e. the hypothesis with the correct state I = 3. 
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3.3 Likelihood 

To find the best hypothesis of a hand Xq, we will search for the minimum of 
the dissimilarity measure in (unj over X. For the purpose of tracking (using 
particle filtering as will be described in section^, it is more convenient, however, 
to maximize a likelihood measure p{X\X) = instead. Thus, we define 

a likelihood function as 



4 Tracking and Recognition 

Tracking and recognition of a set of object models in time-dependent images can 
be formulated as the maximization of a posterior probability distribution over 
model parameters given a sequence of input images. To estimate the states of 
object models in this respect, we will follow the approach of particle filtering to 
propagate object hypotheses over time, where the likelihood of each particle is 
computed from the proposed likelihood and dissimilarity measures (HDD and dni). 

To a major extent, we will follow traditional approaches for particle filtering 
as presented by (Isard and Blake 1996, Black and Jepson 1998, Sidenbladh et 
al. 2000, Deutscher et al. 2000) and others. Using the hierarchical multi-scale 
structure of the hand models, however, an adaptation of the layered sampling 
approach (Sullivan et al. 1999) will be presented, in which a coarse-to-fine search 
strategy is used to improve the computational efficiency, here, by a factor of two. 
Moreover, it will be demonstrated how the proposed dissimilarity measure makes 
it possible to perform simultaneous hand tracking and hand posture recognition. 

4.1 Particle Filtering 

Particle filters aim at estimating and propagating the posterior probability dis- 
tribution p{Xt,Yt\Tt) over time, where Xt and 1* are static and dynamic model 
parameters and X* denotes the observations up to time t. Using Bayes rule, the 
posterior at time t can be evaluated according to 



where A: is a normalization constant that does not depend on variables Xt,Yt. The 
term p{Xt\Xt,Yt) denotes the likelihood that a model configuration Xt, Yt gives 
rise to the image Xf. Using a first-order Markov assumption, the dependence on 
observations before time t—1 can be removed and the model prior p{Xt, Yt\Xt-i) 
can be evaluated using a posterior from a previous time step and the distribution 
for model dynamics according to 






( 11 ) 



where the parameter cr = 10 ^ controls the sharpness of the likelihood function. 



p{Xt,Yt\Xt) = kp{Xt\Xt,Yt)p{Xt,Yt\Xt-i) 



( 12 ) 
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Since the likelihood function is usually multi-modal and cannot be expressed in 
closed form, the approach of particle filtering is to approximate the posterior dis- 
tribution using N particles, weighted according to their likelihoods p{Xt\Xt,Yt). 
The posterior for a new time moment is then computed by populating the par- 
ticles with high weights and predicting them according to their dynamic model 

4.2 Hand Tracking and Recognition 

To use particle filtering for tracking and recognition of hierarchical hand models 
as described in section |3 we let the state variable X denote the position (x,y), 
the size s, the orientation a and the posture I of the hand model, i.e., X = 
{x,y, s,a,l), while Y denotes the time derivatives of the first four variables, 
i.e., Yt = {x,y,s,a). Then, we assume that the likelihood p{Xt\Xt,Yt) does not 
explicitly depend on Yt, and approximate p{Xt\Xt) by evaluating p{X‘^\X"^) for 
each particle according to m- Concerning the dynamics p{Xt, Yt\Xt-i,Yt-i) of 
the hand model, a constant velocity model is adopted, where deviations from the 
constant velocity assumption are modelled by additive Brownian motion, from 
which the distribution p(Xt,Yt\Xt-i,Yt-i) is computed. To capture changes in 
hand postures, the state parameter I is allowed to vary randomly for 30 % of the 
particles at each time step. 

When the tracking is started, all particles are first distributed uniformly over 
the parameter spaces X and Y. After each time step of particle filtering, the best 
hypothesis of a hand is estimated, by first choosing the most likely hand posture 
and then computing the mean of p{Xt,Yt\Xt) for that posture. Hand posture 
number i is chosen if Wi = maxjiwj), j = 1,...,5, where Wj is the sum of 
the weights of all particles with state j. Then, the continuous parameters are 
estimated by computing a weighted mean of all the particles in state i. 

4.3 Hierarchical Layered Sampling 

The number of particles used for representing a distribution determines the speed 
and the accuracy of the particle filter. Usually, however, most of the particles 
represent false object hypotheses and serve as to compensate for uncertainties 
in the estimated distribution. To reduce the number of such particles, and thus 
improve the computational efficiency, one approach is to divide the evaluation 
of the particles into several steps, and to eliminate unlikely particles already at 
the earliest stages of evaluation. This idea has been used previously in works 
on partitioned sampling (MacCormick and Isard 2000) and layered sampling 
(Sullivan et al. 1999). 

The layered sampling implies that the likelihood function p{Xt\Xt) is decom- 
posed as p = pip 2 - ■ -Pn and that false hypotheses are eliminated by re-sampling 
the set of particles after a likelihood pi{Xt\Xt) has been evaluated at each layer 
i = 1. . .n. The idea is to use a coarse-to-fine evaluation strategy, where pi evalu- 
ates models at their coarsest scale, while performs the evaluation at the finest 
scale. 
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In the context of hierarchical multi-scale feature models, the layered sam- 
pling approach can be modified such as to evaluate the likelihoods pi{It\Xt) 
independently for each level in the hierarchy of features. Hence, for the hand 
model described in section |2 the likelihood evaluation is decomposed into three 
layers p = P 1 P 2 P 3 , where pi evaluates the coarse scale blob corresponding to 
the palm of a hand, p 2 evaluates the ridges corresponding to the fingers, and ps 
evaluates the fine scale blobs corresponding to the finger tips. 

Experimentally, we have found that the hierarchical layered sampling ap- 
proach improves the computational efficiency of the tracker by a factor two, 
compared to the standard sampling method in particle filtering. Figure 0 illus- 
trates a comparison between these two approaches concerning the performance 
of hand posture recognition step of the tracker - see (Laptev and Lindeberg 
2000) for a more extensive description. 



Hierarchical layered sampling 

P 




Standard sampling 








- — 




J 






JLL 



Fig. 4. Curves representing probabilities of model states I = 1,...,5 while tracking 
a hand with changing postures. The results are shown for the hierarchical vs. the 
standard sampling technique, using the same number of particles. 



5 Hand Gesture Analysis 

An application we are interested in is to track hands in office and home environ- 
ments, in order to provide the user with a convenient human-machine interface 
for expressing commands to different types of computerized devices using hand 
gestures. The idea is to associate the recognised hand states with actions, while 
using the estimated continuous parameters of the hand model to control the 
actions in a quantitative way. 

The problem of hand gesture analysis has received increased attention in re- 
cent years. Early work of using hand gestures for television control was presented 
by (Freeman and Weissman 1995) using normalised correlation. Some approaches 
consider elaborated 3-D hand models (Regh and Kanade 1995), while others use 
colour markers to simplify feature detection (Cipolla et al. 1993). Appearance- 
based models for hand tracking and sign recognition were used by (Cui and Weng 
1996), while (Heap and Hogg 1998, MacCormick and Isard 2000) used silhou- 
ettes of hands. Graph-like and feature-based hand models have been proposed 
by (Triesch and von der Malsburg 1996) for sign recognition and in (Bretzner 
and Lindeberg 1998) for tracking and estimating 3-D rotations of a hand. 

The proposed approach is based on these works and is novel in the respect 
that it combines a hierarchical object model with image features at multiple 
scales and particle filtering for robust tracking and recognition. 
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5.1 Multi-state Hand Tracking 



To investigate the proposed approach, an experiment was performed of track- 
ing hands in different states in an office environment with natural illumination. 
The particle filtering was performed with N = 1000 particles, which were evalu- 
ated on the N'^ = 200 strongest scale-space features extracted from each image. 
Figures inia)-(c) show a few results from this experiment. As can be seen, the 
combination of particle filtering with the dissimilarity measure for hierarchical 
object models correctly captures changes in the position, scale and orientation 
of the hand. Moreover, changes in hand postures are captured. 



Size variations 





State change 




(a) (b) (c) 

Fig. 5. Result of applying the proposed framework for tracking a hand in an office 
environment, (a): size variations; (b) rotations; (c): a change in hand state i : 5 ^ 2. 

As a test of the stability of the hand tracker, we developed a prototype of a 
drawing tool called DrawBoard, where hand motions are used for controlling a 
visual drawing in a multi-functional way. In this application, the cursor on the 
screen was controlled by the position of the hand, and depending on the state of 
the hand, different actions could be performed. A hand posture with two fingers 
implied that DrawBoard was in a drawing state, while a posture with one finger 
meant that the cursor moved without drawing. With three fingers present, the 
shape of the brush could be changed, while a hand posture with five fingers 
was used for translating, rotating and scaling the drawing. Figure Elshows a few 
snapshots from such a drawing sessionQ As can be seen from the results, the 
performance of the tracker is sufficient for producing a reasonable drawing. 

A necessary pre-requisite for this purely intensity-based system to give sat- 
isfactory results is that there is a clear contrast in intensity between the object 
and the background. In on-going work, it is shown that the sensitivity to the 
choice of background can be reduced substantially by (i) performing colour- 
based feature detection, and by (ii) including a complementary prior on skin 
colour. In a project for computer-vision-based human-computer-interaction, this 
extended system is used for capturing hand gestures controlling different types 
of computerized equipment (Bretzner et al. 2001). 

The integrated algorithm currently runs at about lOHz frame rate on a mod- 
est dual processor PC with two 550MHz Pentium III processors. An important 
component in reaching real-time performance is an efficient pyramid implementa- 
tion of the multi-scale feature detection step (Lindeberg and Niemenmaa 2001). 

A longer movie clip is available from http://www.nada.kth.se/cvap/gvmdi/. 
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Drawing with a pencil of varying size 



J ll 
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(a) 

Drawing with the elliptic pencil 


' 




I 





(c) 




Rotating the drawing 




Fig. 6. DrawBoard. The hand is used as a drawing device where the position, the size 
and the orientation of a pencil are controlled by the corresponding parameters of a 
hand in the image (a),(c). In (b) the user is able to change the elliptic shape of a pencil 
by rotating a hand in a state with three open fingers. In (d) the drawing is scaled and 
rotated with a hand in a state with five open fingers. 



6 Summary and Discussion 

We have demonstrated how a view-based object representation in terms of a hi- 
erarchy of multi-scale image features can be used for tracking and recognition in 
combination with particle filtering, based on a scale-invariant dissimilarity mea- 
sures, which relates features in the object representation to image data and en- 
ables discrimination between different spatial configurations. The combination of 
this measure with multi-scale features makes the approach truly scale-invariant 
and allows for object tracking and recognition under large size variations. 

In an application to hand gesture analysis, we have shown how qualitative 
states and quantitative motions of a hand can be captured. In this context, the 
use of a hierarchical multi-scale model allows us to perform hierarchical layered 
sampling, which improves the computational efficiency by reducing the number 
of particles. 

In combination with a pyramid implementation of the feature detection stage, 
real-time performance has been obtained, and the system has been tested in ap- 
plication scenarios with human-computer interaction based on hand gestures. In 
this context, the qualitative hand states were used for selecting between different 
actions, while the continuous parameters were used for controlling these actions 
in a quantitative way. 

Although a main emphasis here has been on hand models, we believe that the 
proposed framework can be extended for tracking and recognizing broader classes 
of objects consisting of qualitatively different structures at different scales. 
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Abstract. We establish in 2D, the PDE associated with a classical de- 
bluring filter, the Kramer operator and compare it with another classical 
shock filter. 



1 Introduction 



Gabor remarked in 1960 that the difference between the original uq and a blurred 
version image of it A: * uq is roughly proportional to its Laplacian. In order to 
formalize this remark, we have to notice that k is spatially concentrated, and 
that we may introduce a scale parameter for k, namely kh{x) = A)- Then 



Up * fc/i(x) - uq{x) 

h 



Z\uo(x), 



so that when h gets smaller, the blur process looks more and more like the heat 
equation 

du . 

— = Au, u[0) = Up. 

Gonversely, Gabor deduced that we can, in some extent, deblur an image by 
reversing time in the heat equation : 

du 

TJT — Ziti, ~ '^observed’ 



Numerically, this amounts to iterating substraction of its Laplacian from the 
observed image : 



'^restored — ^observed hAUobserved- 



^observed ^restored 

This operation can be repeated several times with some small values of h, un- 
til it... blows up. Indeed, the reverse heat equation is extremely ill-posed. All 
the same, this Gabor method is efficient and can be applied with some success 
to most digital images obtained from an optical device. See also 0 for other 
information about Gabor image enhancement. 



M. Kerckhove (Ed.): Scale-Space 2001, LNCS 2106, pp. 75-f^3 2001. 
© Springer- Verlag and lEEE/CS 2001 
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Some attempts to improve the time-reverse heat equation are: 

The Osher-Rudin Equation of Shock Filter. Osher and Rudin CH proposed to 
shapen a blurred image ug by applying the following equation: 

Ou 

— = —sign(Au)lE>uj, with u(0,x) = uo(x) 

This can be seen as a pseudo inverse of the Heat equation, where the propagation 
term Du is tuned by the sign of the Laplacian. We will call in the following TRt 
the operator that associates to uo(.) the function u(t, .) 

The Kramer Algorithm. In P], Kramer defines a filter for sharpening blurred im- 
ages. The filter replaces the gray level value at a point by either the minimum or 
the maximum of the gray level values in a neighborhood. This choice depending 
on which is the closest to the current value. This filter is then localized and iter- 
ated on the image. Let us call TK this filter. Kramer’s filter can be interpreted 
as a partial differential equation, by the same kind of heuristic arguments which 
Gabor developed to derive the heat equation. In H2| the authors proposed a 
finer version of the Kramer filter by ponderating the minimum or the maximum 
by a parabolic function (see following). 

It was proved, in H2!, that the Kramer and the Osher-Rudin filters share the 
same asymptotic behavior for regular ID signal. That is they are infinitesimally 
identical in ID! However, as we will see this is not true in 2D that is in the case 
of images. As we shall see, this equation is 

du 

— = —sign{D^u{Du, Du))\Du\. 

Thus, the Laplacian is replaced by a directional second derivative of the image, 
D^u{Du, Du). 

The general aim of this paper is to prove this result and to establish existing 
links and differences between the two filters in 2D. It is organized as follow: 

In Section El we recall or establish some mathematical statements on scaled 
general monotone operators and their asymptotic. In Section 0 we establish the 
asymptotic of the Kramer filter in 2D and, conversely, we propose an algorithm 
similar to Kramer that simulates the Osher-Rudin equation. At last in Section 
0 we underline a link between these filters and two very classical edge detectors : 
the Canny and the zero-crossings of the Laplacian. Most of the statements with 
complete proofs can be found in the manuscript notes 0 

Before entering the cope of this note, let us point that several methods for im- 
proving the time-reverse heat equation have been proposed. Some are similar to 
the two ones we are going to study, as contrast enhancement filters as e.g. in jO]. 
Another class of such attempts covers so called “restoration” methods. Roughly, 
the main idea is to search for a function Urestored, so that once blurred with 
the heat equation UrCstored gives the original image Uobserved- Since applying 
the heat equation is equivalent to the convolution with an appropriate Gaussian 
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function Go-, one searches for Urestored, so that Urestored*Ga = Uobseri;ed± noise. 
This last equation does not yield a unique solution. So, among possible solutions 
Urestored is chosen as the most regular one. The regularity is then measured by a 
energy / norm, like the Total Variation in m- The “restoration” methods seem 
to give the most promising debluring results. This note will definitively not cover 
these techniques, however we will at the end give an example for comparison with 
the shock filters. 

2 General Form of Scaled Monotone Operator and 
Mathematical Tools 

2.1 Scaled Monotone Operators 

We consider a family T of functions from IB? into IR representing a class of 
images. We define an image operator on as a operator from T into T . An 
operator T is said monotone if (Vx S F?,u{x) > v{x)) — > (Vx,Tw(x) > Tti(x)). 
The following theorem gives us a general form for any monotone and translation 
invariant operator: 

Theorem 1 (Matheron |jHj, Serra Maragos [Z])- Let T be a monotone 
function operator defined of T , invariant by translation and commuting with the 
addition of constant. There exists a family ]F of functions such that 

Tu(x) = sup inf u{y) - /(x - y). 

/gjryGK 

Note: It is a general property of the monotone and translation invariant 
filter to preserve Lipschitz property of any Lipschitz function. As consequence, 
the choice T can be by default made by considering the set of the Lipschitz 
functions. 

Definition 2. We define a (d-scaled operator Th associated to T by 

Th{u){x)= inf sup {u{x + y) - h>^f{y/h)). (1) 



2.2 Asymptotic Theorem 

The utility of the Legendre transform in the study of the monotone operators 
has been noticed in e.g. pj. Let us recall it: 

Legendre Fenchel Transform. 

Definition 3. Let f be a function from into M, we denote the Legendre 
transform of f by f* : Si defined by 

f*{p) = sup(p.x- /(x)). 

X6iR 

Let us note that if / is convex then the Legendre transform is finite for every p. 
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First Order Asymptotic. 

Lemma 4. Let f be a function satisfying the following conditions: 



ffx) 

3C > 0 and a > max(/3, 1) such that liminf ; — > C and /(O) < 0 (2) 

Then, for any and hounded function u, if (5 < 2; 

sup (u(x + y) - h^fiy/h)) - u(x) = h^^ Du{^)) + 0{h‘^^'^-^^) 

y^R^ 

A interesting particular case is when [3 = 1: 



sup (u(x + y) - hf{y/h)) - m(x) = hf*{Du{x)) + 0{h?) 
yeR^ 



Proof. Without loss of generality we can choose x = 0 and m(x) = 0 so that we 
are looking for an estimate of 

sup (m(z) - /i^/(z//i)) 
zeR’^ 

when h tends to 0. Setting y = z/h, we have, 

sup (m(z) - h^f{z/h)) = sup {u{hy) - h^f{y)). 
zeR^ yeR^ 

Let us first prove that we can discard from the preceding sup the y that goes 
too fast toward oo as h tends to 0. We consider the subset Sh of of the y 
such that 

u{hy) - h^f{y) > u(0) - /i^/(0) > 0. 

We obviously have 

sup {u{hy) - h^ffy)) = sup {u{hy) - /i^/(y)). 
yeK*^ yeSh 

Since u is bounded, we have Vy G Sh, f{y) < Cih~^ for some constant Ci 
depending only on ||m||oo- Assume that there exists y^, G Sh tending to oo as /i 
tends to zero. For h small enough, condition (|2I) gives f{yh) ^ C'|y^|“, which 
combined with the preceding inequality yields |y^| < C 2 h~^/°‘. Such a bound 
holds if yh G Sh is bounded, so that we have 

Vy G Sh, |y| < C2h-P/^ 

As consequence, Vy G Sh we have |/iy| = o(l) and we can do an expansion of 
u around 0, so that 

sup {u{hy)-h^f{y))= SMp {hDu{Q).y - h^ f{y) + 0{h^\y\^)) 
yeR^ y&Sh 
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We can now find finer bound for the set Sh repeating the same argument. Vy G Sh 
we have, 

hp.y - h^fiy) + 0{h^y^) > 0 

which yields 

\p\>h>^-^f{y)/\y\ + 0{h\y\) 

Assume that G Sh, satisfying the preceding inequation, tends to oo when h 
tends to 0, then by (0, we obtain |y^| = 0(/i“ Once again, if y^ is bounded 
this estimate holds. So we have 

sup {u{hy) - h^f{y)) = h^{ sup {h^~^p.y - /(y) + 
yeK«‘ yeS;, 

= h^{ sup (/il-Vy-/(y)) + 0 (/i 2 (l-fcl))) = /,/3(/*(/il-/3p)) + 0 (/i 2 (l-^)) 
yeR^ 

It is easily checked that = o{h^) for all (3 <2. o 



Theorem 5. Let F he a family of functions, all satisfying the condition 0) 
with a constant C not depending on the choice of a function within the family. 
Let Th be the fd-scaled operator associated with the family F and with a rescaling 
parameter j3 equal to 1. Then for all and hounded function u we have: 



{Thju) - u){x) 
h 



Hi{Du{x)) + o(l) 



where 

Hi{p) = inf r(p). 

jEJt 

This theorem is an immediate consequence of the preceding lemma applied with 
(3 = 1. 

Second Order Case - Some Heuristics, See P]. Theorem^ gives the first 
order possible behavior of a non-fiat monotone operator. Question occurs on 
what happens if this first order term is 0, that is if Hi{p) = 0 for all p. In that 
case, it is necessary to push the expansion to the second order : 

We have with p = Du{0) and A = Z3^u(0)/2, 

sup u{hy)-h^f{y)= sup hp.y + h^Ay.y - h<^ f{y) + 0{\hy\^) 
yeR^ yeR^ 

Since this last expression is increasing with respect to A it is then expected 
that the left side of the equality converges when h tends to 0, to some function 
F(A,p) where F is non decreasing with respect to A. As consequence, among 
second order operator only elliptic operator can be obtained as the asymptotical 
limit of a general localized monotone operator [J- 
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Example: The Heat Equation as the asymptotic of a non-flat morphological 
operator (in N-D). 

Lemma 6. We set for p G , Q G (symmetric matrix), and h> 0, 

/p,Q,;i(x) =px + Q(x,x) ifxGB{0,h) 

= —oo otherwise 

We then set IF h = {fp,Q,h] with Q G Tr(Q) = 0 and p G M^} which 

is to say that IF h is made of the truncature around zero of all quadratic forms 
whose trace is zero. With Th{u)(x.) = supygjfjw rt(x + y) — /(y), one has 

for any u G , 

Th{u){yi) - m(x) = ^/iMm(x) + o { h ^) 

Proof is just made of simple algebraic computations. Complete proof can be 
found in |^. 



3 Application to Image Enhancement: Kramer’s 
Operators and the Osher— Rudin Shock Filter 



3.1 The Kramer Operator 

Let us now set mathematically the Kramer filter and its variant proposed in 
m- The asymptotic of the Kramer or its variant are similar while not strictly 
identical. But, proofs of asymptotic are the same for both. They can be defined 
as follow: 

For Kramer operator set g(x) = 0 if |x| < 1 and g(x) = +oo otherwise. 
For its variant m simply set q{x) = x^/2. Then for both set ]F~^ = {g}. Set 
Tj)( the rescaled, (with (3=1), non- flat operator associated with the structuring 
elements set ]F'^ and T(( its dual operator. That is 

iTfl(u)(x)= sup u(y) - hq((x-y)/h), 
yeiR" 



(T^ u)(x) = inf u(y)-h/ig((x-y)/h). 

The shock filter TKh is then defined by 

' (Th u){x) if (T+ u)(x) - u(x) < m(x) - {T(( u){x), 



{TKhu)(x) = 



(Tf, u){x) if (T+ u)(x) - u(x) > m(x) - (T^ u)(x), (3) 



^u(x) 



otherwise. 



It is important to note that TK is NOT a monotone operator but that T+ 
and T~ are monotone. TK can be seen as an conditional filter made of two 
monotone operators. 



A Note on Two Classical Shock Filters and Their Asymptotics 



81 



Lemma 7. First order asymptotic of the Kramer Filter. 

(T^m)(x) — m(x) = hH {\Du(jx)\) + 0{hf) and 
(T^m)(x) - m(x) = -hF[{\Du{yi)\) + 0{hf) 

So that 

, (TK hu)(x) — u(x) ,, .MS 

lim = ±i/(|Du(x)|) or 0 

h^o ri 

where Fl{p) = \p\ in the case of the Kramer filter, and H(p) = |pp/2 for its 
variant. 

This lemma is an immediate consequence of Theorem 0 At this step, we 
remark that the differences (T^u)(x) — u{x) and u{x) — {Tffu) are equal at 
the first order, and therefore the choice will be made based on second order 
derivatives of u. 

Proposition 8. One has for any function u € around x, 

lim (x) = —sign{D^u{Du, Du)) F[{\Du{x)\) 

h^o h 

where H{p) = |p| in the case of the Kramer filter, and H(p) = |pp/2 for its 
variant. 

Proof. One has to push the asymptotic of Tj)f and Tff to the second order. We 
only do here the case of variant filter. We have 

Th{u){x) = sup u(y) - and (u)(x) = inf u(y) + 

y^jpiN Zfl Zri 

Since and are translation invariant, we can limit our study at x = 0. 
Moreover, since u is bounded, we can limit the sup to the y G B{0, h). If u is 
at point 0, we can set u{y) = u{0) + p.y + A(y,y) + o(y)^ So that, 

IvP IvP 

T’h^(u)(0)-M(0)= sup u{y )-'— — m( 0)= sup (p.y+A(y, y)--^^+o(/i)^ 
yeB(o,h) yeB(o,h) 

Set Qh{y) — 2hp.y + {2hA — Id){y,y), so that we have 

Th= sup {Qh{y)/{2h)) + o{h)^ 
yeB(o,h) 

For h small enough Bh = Id — 2hA is positive and invertible. Therefore, the sup 
of Qh over the y exists, and is achieved for y^ such that 

2hp + 2By^ = 0 ^ y^ = -hB-^{p) 

Thus, 

Thiu){0) - u{0) = ^(Jd- 2/iA)”i(p,p) + o(/i^) = ^{Id + 2hA){p,p) + o{h'^) 
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We conclude that 

Thiu){0)-u{0) = + h^A{p,p) + o{h^) (4) 

Similarly, 

7rW(0)-w(0) = ^\P\^ - h^MP,P) + o{h^) (5) 

From these two last equalities and translation invariance we deduce that, for Vx: 

{{Th w)W - u(x)) - (u(x) - {T^ m)(x)) = 2h^{D'^u{x)){Du{yi),Du{x)) + o{h^) 

( 6 ) 

We therefore have 

Th{u){x) — m(x) = — /i^|£)u(x)p sgn{ Z?^m(x) (Z?u(x), Z?u(x)) )/2 + o(h^) 



o 

Let us remark that if m is a ID function, then sign{D'^u{Du, Du)) coincides 
with the sign of the Laplacian. That is that the Kramer operator corresponds, 
in ID, asymptotically to the Osher-Rudin shock filter. However they differ in 
2D. 

3.2 The Osher Rudin Shock Filter 

The Osher-Rudin Shock Filter has been proposed by the authors directly in its 
asymptotic way, that is by a PDF. The question occurs on the existence of a 
scheme similar to the Kramer one, but that would simulate the Osher-Rudin 
shock filter. Such a scheme can be defined as follow: 

Let Bh be a disk of radius h centered at 0. Let Meauh be the mean value on 
the disk Bh- We define the operator Th by: 

Thu{x) = miny^Bh'^i^ + y) if Meanh{u){x) > u(x) 

= maxy^BhU(x + y) if M earth (u)(x) < u(x) 

= u(x) otherwise 



Proposition 9. One has 

lim — - = —sign{Au)\Du\ 

h^o h 

The proof follows from the fact that Meanh{u){x) — u{x) = Au{x) + o{h‘^). 

4 Conclusion 

The preceding study permits to establish the following diagram. It makes clearer 
the relationships between the various kinds of edge detectors and debluring fil- 
ters. 
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In 2D: 



Kramer Filter Osher-Rudin Filter 



- \Du\sgn{D^u{Du, Du)) - \Du\sgn{Au) 



replace u by 



min or max of u min or max of u 

in a neighborhood in a neighborhood 



depending 



on which of the two on whether the mean of u 
is the nearest to m in a neighborhood 

is above or below u. 



Associated edge detector Canny edge detector Zero-Crossing of Laplacian 

D^u{Du, Du) — 0 Au = 0 

In particular, by the same arguments as Osher-Rudin El , we can claim that 
Kramer’s operator enhances Canny edges. Osher and Rudin proved that their 
operator enhanced Hildreth-Marr edges. 

The figure ^illustrates the effect of both operators. Left, an original blurred 
image, next the Osher and Rudin shock filter (steady state), next improved 
Krammer operator (steady state) and at last for comparison purpose the Rudin- 
Osher-Fatemi debluring method mg. This last, in contrary to the two studied in 
this paper, makes an explicit used of knowledge about blurring kernel and noise 
statistic. 
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blurred image, Top-right: Osher-Rudin shock filter m which is a pseudoinverse of 
the heat equation attaining a steady state, Bottom-left: Kramer’s improved shock fil- 
ter na, Bottom-right: also attaining a steady state and the Rudin, Osher, Fatemi 
restoration method m, obtained by deblurring with a controlled image total variation 
and using explicit knowledge of the blurring kernel. 
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Abstract. Bayesian statistical theory is a convenient way of taking a 
priori information into consideration when inference is made from im- 
ages. In Bayesian image detection, the a priori distribution should cap- 
ture the knowledge about objects. Taking inspiration from [Q, we design 
a prior density that penalizes the area of homogeneous parts in images. 
The detection problem is further formulated as the estimation of the set 
of curves that maximizes the posterior distribution. In this paper, we ex- 
plore a posterior distribution model for which its maximal mode is given 
by a subset of level curves, that is the boundaries of image level sets. For 
the completeness of the paper, we present a stepwise greedy algorithm 
for computing partitions with connected components. 



1 Introduction 

In most problems of image analysis, incorporation of prior knowledge is im- 
portant for making inference based on the images. Bayesian object detection is 
the problem of how to estimate the number of simply connected objects and 
their location in a non-ideal environment. Bayesian approaches specify ways for 
segmenting the entire image using global energy criteria. Indeed, it is usually 
straightforward to transfer a Bayesian criterion into an energy minimization cri- 
terion. In addition, additivity is desirable in models which must be analyzing by 
Markov Chains Monte Carlo sampling. Thereby, the discrete IBI2I or continuous 
fIHI17j energy functional is traditionally designed as a combination of several 
terms, each of them corresponding to a precise property which must be satis- 
fied. While this modeling offers a powerful theoretical framework and minimizers 
exist piYI^bj . they have several disadvantages. First, these models lead to very 
difficult optimization problems that are notoriously slow to converge |2|. Second, 
the weight parameters which are key ingredients of a wide range of segmentation 
energies, are usually not correctly estimated, yielding to supervised segmenta- 
tion methods. Third, sampling from a Markov Random Fields distribution does 
not always produce patterns that look like images. 

Object detection belongs to the field of high-level imaging, in which the image 
modeling is on a more global scale compared to low-level imaging which deals 
with (smoothing) prior models on a pixel level. In particular, more global prior 
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models for the simply connected objects can be applied. There has been a grow- 
ing interest in this field, particularly along the guidelines of Grenander’s general 
pattern theory using deformable templates 0. Moreover, Zhu et al. attempted 
to unify snakes lEl, balloons and region growing methods within a general 
energy/Bayes framework |25| . These prior models are generally realistic and in- 
corporate prior information about the outline of objects in a Bayesian image 
analysis framework. In other respects, both approaches estimate the curves that 
maximally separate unknown statistics inside and outside the curves 1^ . The 
maximum a posteriori (MAP) estimate is generally determined by prohibitive 
stochastic search procedures |S| or other variants of steepest ascent algorithms 
m Thereby, additional a priori knowledge may be specified to ease the seg- 
mentation task: statistics inside region boundaries are assumed to be known m 
or estimated using ad-hoc methods 121121 ). The global energy functional may be 
then optimized, for instance, within a level set framework which pro- 

vides the advantages of numerical stability and topological flexibility iEEH. 
In practical imaging, these methods may suffer from the problem of initializa- 
tion of curves off-line estimation of the mixture model of Gaussians ap- 
proximating the probability density function of the image 1231, or selection of 
hyperparameters weighting the contribution of energy terms [232D11I. 

In this paper, we address these problems and follow the Bayesian approach 
for recovering simply connected objects in the plane. The prior model focuses on 
how the area and number of objects can varied in images (Section 2). It allows to 
partition the image into few regions, though in a more restrictive manner than 
previous approaches since it can generate irregular boundaries. Unlike 

other approaches we shall see that maximizing the posterior distribution 

is herein equivalent to select a subset of connected components of image bilevel 
sets (Section 3). Section 4 presents the numerical implementation of our model 
and the computation of the image segmentation. In Section 5, we illustrate this 
approach with some experiments on satellite images. Gonclusions and perspec- 
tives are presented in Section 6. 



2 The Bayesian Framework 

Let S be an open subset of and / a grey-scale image treated as a function 
defined on S. In practical imaging. S' is a collection of pixels within a discretized 
rectangle, and possible values of / are given by integers [0,256[nN. Below we 
will work in the continuous setup, where S is a subset of a Euclidian space and 
f : S ^ K'*' represents the observed data function. The continuous setup allows 
us to refer to analytic tools, while leaving always a possibility to “discretize” 
the problem. We use the terminology “site” or “pixel” to denote a point of 
the image, even in the continuous case. Each point x G S is assigned a grey 
value f{x). According to Matheron ^^1) we interpret the image / as a family 
of sets defined by Lj{f) = {x G S : f{x) > 7}, 7 G K+. Each level sets Lj{f) 
is assumed to be of finite perimeter. Therefore, / will belong to the bounded 
variation (noted BV) space p. 
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Let {fii C S'} be a set of disjoint and non-empty image domains or objects, 
and {dQi} their boundaries. A partition of the space S consists in finding a set 
a background Q defined as the complementary subset of the union 
of objects Q = S \ fii, fit Di^j fij = 0 and fii C\i fi = 0 . 

We assume that the observed image / has been produced by the model / = 

/true + e, where e is a zero- mean Gaussian white noise: e{x) ^(0, tr^), x G S. 
The true image ftme{x) = f g Qi + supposed piecewise 

constant, where f^. and f-Q denote respectively the unknown average values 
of / over fii and fi, and Ixge is the set indicator function of the set E. The 
variance cr^ is assumed to be known and constant over the entire image [25| . So, 
the likelihood for the data / given {fii, - ■ ■ ,fip} is specified by 

p(/| Cl, - • ■ ,Cp) « exp-^ + . (1) 

We seek a partition of the rectangle S into a finite set of objects fii, each of 
which corresponding to a part of the image where / is constant. Therefore, we 
define the following collection Cp of P > 0 admissible, closed and connected 
objects 



Cp = {{fii,...,fip}cS ■, S\fi= [jfi^;fi^ n 

i = l 

When P = 0, there is no object in the image. Following the Bayesian approach, 
we use some functional of the posterior distribution of {l7i, • • • , fip}: 

p{fii, ■■■ ,fip \ f) cx p(/ I I7i, • • • , fip) Tr{fii, ■■■ , fip) (2) 

where p{f \ fii, • • • , fip) is the likelihood given by 0 and 7 t(17i, • • • , fip) is 
the prior distribution of objects. The posterior distribution is used in a further 
inferential issue concerning the objects within the Bayesian paradigm. The a 
priori distribution should capture the knowledge about {fii, - ■ ■ , fip}. We define 
a density that penalizes the area 1 17^ | of objects. Additionally, the variables { | | } 

may be considered as independent random variables with density g{\fii\). Hence, 
the prior distribution is of the form 7 t(17i, • • • , fip) = Z~^ IliLi where 

Zp is a normalization constant and a a real positive value. The density g(\fii\) 
is chosen to be a non-negative monotically decreasing function of the object 
area \fii\. For instance, Alvarez et al. have realized experimentally that the 
area distribution of homogeneous parts in images follows a power law (3\fii\~'^ . 
The parameters [3 and 7 (close to 1,2 for values of \fii\ in a certain range) 
give the intensity of the model. In what follows, we shall consider this model 
for the density g{\fii\). There are other possible choices of g{\fii\) ; the case of 
g{\fii\) oc exp ~P\fii\'^ has been already discussed in jl.'-iil) . This model is related 
to the Markov connected components fields m- 
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3 Bayesian Inference 

All kinds of inference are made from p{f2\, • • • , Op \ /). Finding the maximum 
a posteriori (MAP) estimate is herein our choice of inference. As a consequence, 
the MAP estimation of objects comes to the minimization of a global energy 
function Ex{f,Oi,...,f2p) defined as 

p r _ r - ^ 

fpif d.x+ fjff dx + A ^(7log(|f2i|) - A) (3) 

i=l ^ i=l 

where Ep{Oi, . . . , Op) is the penalty functional, E^{f,Oi,...,Op) the data 
model, X = 2a >0 the regularization parameter and A = log(/3). The penalty 
functional tends to regulate the emergence of objects Oi in the image and gives 
no control on the smoothness of boundaries. The regularization parameter A 
can be then interpreted as a scale parameter that only tunes the number of 
regions If A = 0, each point is potentially a region and 17 = 0 ; the 

global minimum coincides with zero and this segmentation is called the “trivial 
segmentation” |lYll4j . 

Our MAP estimator is defined by (when exists) 

(I7i, . . . , Op) = argmino<p<-r argmin^^^ E\{f, Oi,..., Op) (4) 

where Cp C Ct,X/P < T, and T is the maximum number of admissible objects 
registered in a bank Cp- We recall that O = S \ Oi is the complemen- 
tary subset of estimated objects {I7i, . . . , Op}. By using classical arguments on 
lower semi-continuous functionals on the BV space, we assume here existence 
of minimizers of E\{f, l7i, • • • , Op) among functions of sets finite perimeter (or 
of bounded variation) [ I . However, a direct minimization with respect to 
all unknown domains Oi and parameters f q. is a very intricate problem, even 
if T is low since objects are not designed. In what follows (Lemma 1), we prove 
that the object boundaries that minimize E\{f,Oi,...,Op) are level lines of 
the function /, which makes the problem tractable. 

Lemma 1 . If there exists minimizers and no pathological minimum exists, then 
the energy minimizing set of curves is a subset of level lines of f: 

f\80i — i = 1, . . . , P . 

i.e. the border dOi of each Oi is a boundary of a connected component of a level 
set of f. 

Proof of Lemma 1 Let Os be a variation of a set O, i.e. the Hausdorff dis- 
tance doo{Os, O) < S . To prove Lemma 1, we assume that, for any connected 
perturbation of O such doo{Os, O) < S, two neighboring sets O and O' do not 
merge into one single set 17 U 17' and, for any connected perturbation of 17 
such dao{Os,0) < S, O does not split into two new sets. This corresponds to 
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prohibited topological changes. Witliout loss of generality, we prove Lemma 1 
for one object L? and a background 17, that is the closure of the complementary 

set of 17. For two sets A and B, denote / / / f — /• Then, we have 

Ja\b Ja Jb 

f and r / A -/'/A =2/// f+([ A(5) 

JQs\n \J ni J \J n J Jn J Qs\n \Jns\n j 

The difference between the involved energies is defined as AE\ (/, 17) = 
Ex{f,ns)-Ex{f,f 2 ) = T 1 + T 2 + T 3 + T 4 + T 5 where 




Denote Z\|l7| = |l7i| — |17|. Using (0, and passing to the limit Z\|l7| ^ 0, i.e. 
1 17^1 ~ 1 17 1, we obtain (higher order terms are neglected) 




We define the following image moments mo = J^l, mi = Jq f, Kq = fgl, 
K\ = Jg f. Using the mean value theorem for double integral, which states that 
if / is continuous and a connected subset E is bounded by a simple curve, then 
for some point a;o in E we have f{x)dE = f{xo) • \E\ where \E\ denotes the 
area of E, it follows that 



AE^if, 17) 



Mo 

f _ {Ki - miY 
I ["^0 (^0 - miiY 



A7 

Too 



Ml 

2{Ki — mi) 2mi 

Kq - mo mo 



f{xo) 




(8) 



Let Xb be a fixed point of the border dil. Choose 17^ such that dfis = 917 except 
on a small neighborhood of Xb- The energy having a minimum for 17, f{xb) needs 
to be solution of the following equation 

,. AEx{f,f2) 

hm -7— 

zi|r2Ho Z\|17| 



[Mo + Mif{xb)] + 0{A\n\) = 0. 



( 9 ) 
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By passing to the limit A\fl\ 0, we obtain Mo + Mif{xb) = 0. This equation 
has one single solution. The coefficients Mq and Mi do depend on neither xt nor 
f{xb), and Mq ^ 0. The function / is continuous and dQ is a connected curve. 
Therefore /(xb) is constant when Xh covers dfl. This completes the proof. □ 
We have proved Lemma 1 with a connected perturbation including the situ- 
ation when |l7b| — |J7| = |9f7| where \dQ\ is the boundary length of Q. Equation 
0 states a necessary condition which is essential to prove that a subset of level 
lines globally minimizes the energy. 

If / is of bounded variation, the connected components of level sets can be 
characterized by their boundaries, that is the so-called level lines of / 0. In 
consequence of Lemma I, those curves constitute the borders {dfli} of objects 

4 A Stepwise Greedy Algorithm for Image Segmentation 

This section describes our algorithmic procedure for object boundaries estima- 
tion. Our recommendations for the concrete choice of the input parameters are 
collected in this section. The algorithm we propose does require neither the 
number of regions nor any initial mean gray values for regions and background. 



4.1 Level Sets and Object Boundaries 

The key ingredient of the procedure is the construction of objects whose bound- 
aries are image level lines 0. In practical imaging, we can associate with an 
image 255 level sets {Lj{f)}, 0 < 7 < 255. We consider the scenario where 
a point X belongs to one single connected component at once within the image 
level sets. We take into account this fact and define the bilevel sets of / as the set 
of pixels X £ S such that v < f{x) < w, 0 < v < w. Instead of computing all the 
255 level sets, we restrict only this computation to a small number of K{< 255) 
level sets and adaptively quantize the image histogram using an entropy method 
E3- For I G N varying from 1 to K, let 5/ be the binary image with bi{x) = 1 if 
f{x) G [ti-i,ti) and bi{x) = 0 otherwise, where ti is a threshold. We call those 
images the Lf -bilevel sets of f G [/mi„,/max] P- In general, each bilevel set is 
made up of n{ti) disjoint connected components, where n(ti) is a function of 
the threshold ti and S = U fiti ,2 U • • • U A crude way to 

build pixels sets corresponding to objects would be to proceed to a connected 
components labeling of binary images {h}, 1 < I < K, and to associate each 
label with an object 17^. 

If / is bounded, the connected components of level sets can be characterized 
by their surrounding curves, that is the level lines m If we map these level 
lines for a given set of K levels, we get a segmentation of the image also called 
topographic map 00 . More generally, one can consider a segmentation achieved 
using only some connected components of level sets, which is the philosophy of 
our approach. The most perceptible level lines can be determined by an isoperi- 
metric criterion 0 or the detection of T-junctions of level lines 0. Both criteria 
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are strong indicators of region boundaries. Instead, we use herein a simpler cri- 
terion where perceptually significant level lines are the level sets boundaries of 
an quantized image by using K quantizers and an entropy method m- En- 
tropy methods seek to maximize the information content between objects and 
background pixels of an image. The method due to Kapur et al. chooses the 
thresholds {t/} to be the values at which the information is maximum. As a 
consequence, the detection of meaningful level lines will depend on the quanti- 
zation parameter K. Unlike previous criteria tazi, this quantization operation 
is not invariant to contrast changes. Nevertheless, we shall see that, in practice, 
K = {4, . . . , 8} seems sufficient to detect physically meaningful objects in the 
image. 



4.2 The Segmentation Procedure 



The proposed algorithm is not a region growing algorithm as described in IW7I 
since all objects are built once and for all. Although our work is related to 
morphological approaches based on connected operators mm, it is an inde- 
pendent approach since we seek minimizers of a global objective functional. In 
addition, it differs from the watershed approach since regions that emerge from 
the watershed segmentation are not necessarily connected components within 
the image level sets m 

We post-process the connected components to remove any components whose 
surface area |l7i| is less than some threshold (a parameter of the method) 

to eliminate regions corresponding to noise and artifacts in the original image 
f21ltill()| . To implement our level set image segmentation based on energy mini- 
mization, a four step method is used. Let K, A, |l7mm| be the input parameters 
set by the user. 



1. Bilevel Set Construction. The first step completes a crude mapping of each 
image pixel on a given bilevel set. At present, we quantize the function / G 
[/minj/max] in AT = {4, • • • ,8} non-cqual-sizcd and non-overlapping intervals 
[ti-i,ti), I = {1, • • ■ ,K}. Given this set of intervals estimated using the maxi- 
mum entropy sum method nn, let bi be the bilevel set image with bi{x) = 1 if 
f{x) G [ti-i,ti) and bi{x) = 0 otherwise. 

2. Object Extraction. A crude way to build pixels sets corresponding to objects 
is to proceed to a connected components labeling of images {bi\ and to associate 
each label with an object f2i. Though this process may work in the noise- free 
case, in general we would also need some smoothing effect of the connected com- 
ponents labeling. So we consider a size-oriented morphological operator acting 
on sets that consists in keeping all connected components of the output of area 
larger than a limit |l7mi„|. This connected operator in mathematical morphology 
will never introduce new features or edges and boundaries of remained connected 
components are preserved pz I llil 1 1 )j . The list of connected components then forms 
the bank Ct of admissible objects {l7i, . . . Ox} with \Oi\ > 
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3. Configuration Determination. The connected components are then combined 
during the third step to form object configurations. For instance, these configu- 
rations can be built by enumeration of all possible object combinations, i.e. 2^ 
configurations. Each configuration is made of a subset of objects taken in the 
bank {J7i, . . . 17 t}- The background 17 corresponds to the complementary set of 
objects selected for each configuration. 

4- Energy Computation and Objeet Configuration Selection. Energy calcula- 
tions take the image intensities of the original (not quantized) image to estab- 
lish piecewise-constant approximation errors. Energies of the form — 

dx} are computed once and stored on a ram memory. The energy term 
fjj(f(x) — fjj)^ dx is efficiently updated for each configuration since 17 is the 
complementary subset of the union of objects The configuration that 

globally minimizes the energy functional corresponds to the MAP segmentation. 
The time necessary to perform image segmentation essentially depends on the 
size of the object bank Ct- 

4.3 Computational Issues 

Now we discuss how some parameters of the procedure can be selected and 
indicate one possible choice used in our experimental results. On the discrete 
domain S, the neighborhoods of a pixel x are typically defined via 4-connectivity 
or 8-connectivity. 

Number of Bilevel Sets . The value of K is mainly determined by the number of 
meaningful objects that one wishes to extract and the computational effort one 
is able to spend. Decreasing K allows to reduce the number of connected com- 
ponent. In our approach, we determine the optimal configuration of objects by 
supervising a small set of levels. In practice, our approach successfully segmented 
various images into only 4 or 8 levels. 

Minimal Area of Objects. The area-oriented operator affects the image by re- 
maining connected components within the image level sets that do not satisfy 
the minimum criterion mmnj- Boundaries of connected components are not 
distorted by this operator as occurs with other types of image filters (such 
as openings and closings using structuring elements). Our default choice is 
G [0.0001,0.001] X \S\. 

Prior Parameters A and 7. For fixed K, we consider the sets of observations 
{log(|l7i|), log(g(|l7i|)), 1 < i <T}. We perform a linear regression on this set 
so as to find the straight line (in the log-log coordinates) log((7(|l7i|)) = A — 
7log(|l7i|) the closest to the data in the least squares sense p. 

Hyperparameter A. The choice of this parameter determines mostly the prop- 
erties of the segmentation result. Increasing this parameter reduces the final 
number of objects to be extracted. If / is a function from S to [0, 255], a default 
choice for the hyperparameter is A G [0.1, 1.] x 255^. Of course larger values of 
A lead to even extraction of only one object. In practice, it’s possible for us to 
to tune this parameter according to image contents. 
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Energy Minimization. For a fixed bank Ct = ,^t} of T objects, one 

way to choose the optimal set of of objects • • • , f?p}, P < T, is to search 
for all possible combinations of P objects and compute the corresponding energy 
E\{f, l7i, • • • , f2p). Enumerating all possible sets of objects in the object bank 
and comparing their energies is computationally too expensive if T is large (typ- 
ically, it is infeasible if T > 32). Instead of a such brute force search, we propose 
the following stepwise greedy algorithm for minimizing E\(f, L?i, • • • , f2p). 

We start from P = 0 and introduce one object at a time. Energies of 
all objects are assumed to be already stored in a ram memory. At the first 
step, we compute the T energies with one single object f2j at once against the 
complementary subset H = S \ Let fli be the estimated object that 

best lowers E\. This object is stored on a ram memory as an object of the 
optimal configuration. It is removed from the initial bank Cp- At any steps of 
the algorithm, a new object is chosen to maximally decrease the energy E\. 

Suppose that at the P-th step, P and fl are not known but we have estimated 
P objects • • • , l2p} and a current background 17 = S' \ {i7i, • • • , f?p}. Let 
E\{f, fii, • • • , i?p) be the current computed energy. Then at the (P-k l)-th step, 
we choose the object G Ct \ {I7i, • • • , Up} which has the maximal difference, 
i.e. 

i?P + 1 = arg max _ Pa(/, Pi, • • • , i?p) - Pa(/, Pi, • • • , i?p, l?j) (10) 

The algorithm stops at P-th step when the adding of any object does not 
decrease E\. This means that the optimal number of objects is P = P and 
the remained objects of the bank are a part of the estimated background, i.e. 

S\{Pi,--- ,Pp}. 

This fast algorithm selects a suboptimal configuration of objects correspond- 
ing to a local minima of the energy functional. Using this algorithm, 
object configurations are examined at the most, whereas the supervision of all 
the configurations corresponds to 2^ global energy computations. 

5 Experimental Results in Image Segmentation 

Experiments were conducted on satellite and meteorological images to evaluate 
the performance of the algorithm. Recall that obtaining the most meaningful 
objects is the goal of this work. For this reason, K was set fairly low in the 
experiments {K = 4 or AT = 8) to obtain large regions and to improve robustness 
to noise and artifacts in the image. Regions which areas |l7i| < [0.0001,0.001] x 
[S’] are discarded. For our method, A varies across the images depending on the 
image contents. It is set empirically and values that gave visually better results 
were chosen. Most segmentations took approximately about 1-15 seconds on a 
296MHz workstation. 

Figure shows an aerial 256 x 256 image (in the visual spectrum) depict- 
ing the region of Saint-Louis during the rising of the Mississippi and Missouri 
rivers in July 1993. We are interested in extracting the rivers and a background 
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(d) Topographic map. (e) Segmentation (f) Optimal segmen- 

with T = 291 objects. tation {P — 105). 



(a) Original image. (b) Image histogram. (c) Area distribution. 



Fig. 1. Satellite image (K = 8, = 0.00025 x \S\, X = 0.25 x 255^;. 



corresponding to textured urban areas. Figure Q shows the segmentation results 
when K = 8, = 0.00025 x |5| and A = 0.25 x 255^. In this experiment, the 

maximum number of significant components is T = 291 (Fig. ^-e). The con- 
nected components that do not satisfy the minimum area criterion are labeled in 
“white” in Fig. The image histogram has been quantized with K = 8 quan- 
tizers and an entropic method (Fig.E))- We estimated the values of A = 3.727 
and 7 = 1.486 by fitting a straight line log(5(|l7i|) = A — 7log(|l7i|) to the ob- 
served data by linear regression. In that case, the least squares error is 2.007 and 
17 < \ f2i\ < 2.88610'* pixels. Figure QT displays the crudely piecewise-constant 
approximation results by setting A = 0.25 x 255^. It takes about 15 seconds 
(25095 < = 42486 iterations) of computing time for building Ct and 

selecting the best configuration {P = 105 objects) using the stepwise greedy 
algorithm. Enumerating all the configurations is infeasible since 2^ = 3.9810®^ 
iterations ! The non-connected background is labeled in “white” in Fig. [ff and 
the objects are filled with their mean gray values {Jq.}- 

The performance of the segmentation procedure is demonstrated for a satel- 
lite (210 X 148) image shown in Fig. |21 For the set of parameters K = 4, 
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(a) Original image. 



(b) Area distribution. 




(c) Piecewise-constant approxima- (d) Optimal piecewise-constant ap- 
tion (T = 207). proximation (P = 65). 



Fig. 2. Meteorological image (K = 4, = 0.00015 x \S\, A = 0.75 x 255^J. 



A = 0.75 X 255^ and |J7„i„| = 0.00015 x IS”!, the algorithm selected P = 65 
objects from the bank which contains T = 213 objects (Fig. Qi). The piecewise- 
constant approximation of the image using T = 213 objects is shown in Fig. 
Efc. The algorithm stopped at the 11765 th iteration (2s of CPU time), i.e. before 
the maximal iteration = 22791. We performed a linear regression to 

estimate the values of A and 7 (5 < \ f2i\ < 1.15910^): A = 1.482, 7 = 1.202. 
This corresponds to a least squares error of 1.34. 

6 Conclusion and Perspectives 

In this paper, we have presented a Bayesian approach for extracting structures 
in images. The prior model penalizes the area of the homogeneous parts of the 
image. Morphological approaches based on connected operators have already 
applied a such criterion but the filtering/segmentation process is not generally 
based on the optimization of a global objective functional. In addition, we proved 
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that our MAP estimator can be determined by selecting a subset of image level 
lines. A total CPU time of a few seconds using a suboptimal stepwise greedy 
algorithm for partitioning a 256 x 256 image into meaningful regions makes 
the method attractive for many time-critical applications. In terms of future 
directions for research, we propose to create a non-linear scale-space by successive 
applications of an area morphology operator to select most meaningful regions 
in the image. 
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Abstract. This paper develops and investigates a new approach for 
evaluating feature based object hypotheses in a direct way. The idea is 
to compute a feature likelihood map (FLM), which is a function nor- 
malized to the interval [0, 1], and which approximates the likelihood of 
image features at all points in scale-space. In our case, the FLM is defined 
from Gaussian derivative operators and in such a way that it assumes its 
strongest responses near the centers of symmetric blob-like or elongated 
ridge-like structures and at scales that reflect the size of these struc- 
tures in the image domain. While the FLM inherits several advantages 
of feature based image representations, it also (i) avoids the need for 
explicit search when matching features in object models to image data, 
and (ii) eliminates the need for thresholds present in most traditional 
feature based approaches. In an application presented in this paper, the 
FLM is applied to simultaneous tracking and recognition of hand models 
based on particle filtering. The experiments demonstrate the feasibility 
of the approach, and that real time performance can be obtained by a 
pyramid implementation of the proposed concept. 



1 Introduction 

When interpreting image data, the purpose of filtering is to emphasize and ab- 
stract relevant properties in the data while suppressing others. Common ap- 
proaches for computing image descriptors involve either (i) the computation of 
sparse sets of image features (feature detection) or (ii) the computation of dense 
maps of filter responses (direct methods) . 

In this respect, a main strength of feature based approaches is that they 
provide an abstracted and compact description of the local image shape. Image 
features are usually invariant to absolute intensity values and can selectively 
represent characteristic visual properties of image patterns. In particular, using 
multi-scale feature detection it is possible to estimate the size of image structures 
and to represent image patterns in a scale-invariant manner. Relations between 

* The support from the Swedish Research Council for Engineering Sciences, TFR, 
and from the Royal Swedish Academy of Sciences as well as the Knut and Alice 
Wallenberg Foundation is gratefully acknowledged. 

M. Kerckhove (Ed.): Scale-Space 2001, LNCS 2106, pp. 98-UU^ 2001. 
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t=4 



I 



t=48 




t=629 



Fig. 1. The result of computing the proposed feature likelihood map on an image of a 
hand. At any point {x, y, t) in scale-space, this function approximates the likelihood of 
symmetric blob-like or ridge-like image structures. In this figure, the feature likelihood 
map is shown for three scale levels t = 4, 48, 629, which (here) are characteristic 
scales for the background, the fingers and the palm of a hand, respectively. Note that 
the response of the map is well localized in space and scale and that the response is 
invariant to the local amplitude of the image structures. 



features in terms of positions, scales, types and other attributes may then be 
effectively used for recognizing objects and image patterns. 

The use of features for image representation, however, also has drawbacks. 
One is that image features may depend on thresholds used for separating relevant 
image structures from noise. This may make the results unstable for patterns 
with low contrast. Another disadvantage is that algorithms involving matching 
of sparse points in image space usually lead to combinatorial complexity. 

The aim of this paper is to develop a dense image representation, which pre- 
serves the advantages of feature-based representation, while avoiding the prob- 
lems of local thresholding and selection of sparse image features for matching. 
The idea is to compute a function on a multi-scale feature space, which is nor- 
malized to the interval [0, 1] and thus independent of the local contrast of the 
grey-level pattern. Moreover, the function will be defined in such a way that its 
response is localized in space and scale, with the strongest responses near the 
centers of blob-like and ridge-like structures. The proposed function, referred 
to as a feature likelihood map, can be used for approximating the likelihood of 
image features. Figured illustrates this concept for an image of a hand. 

A main reason behind this construction is to provide means for direet verifi- 
cation of feature-based object hypotheses. Given a hypothesis, the verification on 
such a map does not require explicit search and will therefore be highly efficient. 
In particular, this approach is convenient for object tracking and object recogni- 
tion based on the recently developed approach of particle filtering. In this paper, 
the feature likelihood map will indeed be used for simultaneous hand tracking 
and hand recognition. The viability of the approach will be demonstrated with 
a pyramid implementation, which gives real time performance. 
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2 The Feature Likelihood Map 

The aim of the proposed likelihood map is to emphasize specific structures in the 
image domain, and to localize them in space and scale. To study this problem, we 
initially restrict ourselves to symmetric blob-like and elongated ridge-like image 
features. The general ideas behind this construction, however, are more general 
and apply to many other aspects of local image structures. 

A general requirement on the proposed feature likelihood map, Ad : x 

K_i_ 1 -^- M, is that for a blob of size to located at a point (xo,yo) in space, M 
should satisfy the following basic properties: (i) Ai should assume its maximum 
value one at (xo,yo', to), (ii) A4 should assume high values in a small neigh- 
borhood of {xo,yo; to), and (iii) A4 should decrease monotonically towards zero 
elsewhere. Additionally, AA should not give preference to blobs of any partic- 
ular size, position or amplitude, and should thus be invariant to scalings and 
translations in the image as well as local changes of the image contrast. 



2.1 Scale-Space Representation 

For any continuous signal / : i— > K, the linear scale-space representation 

L : R-® X R_|_ 1-^ M is defined as the convolution of / with Gaussian kernels g 



L{-;t)=g{-,t)*f{-), ( 1 ) 

where g{x; t) = exp(— (a;f -I- ... -I- a;|,)/2t)/(27rt)'°/^, and x = {x\, ..., xd)^- One 
reason for considering such a representation is that the Gaussian derivatives 

t) = * /) = (dx^g) * f = g* (dx^f) ( 2 ) 

(where m denotes the order of differentiation) constitute a canonical set of fil- 
ter kernels given natural symmetry requirements on a visual front-end (Witkin 
1983, Koenderink and van Doom 1992, Lindeberg 1994, Florack 1997). Another 
reason is that the evolution over scales of a signal and its Gaussian derivatives 
provides important cues to local image structure. One such property, which 
we will make particular use of here, is based on the behavior over scales of 
7 -normalized Gaussian derivative operators (Lindeberg 1998) 

t) = t). (3) 

where ^ = xjt'^t'^ denotes 7 -normalized coordinates. It can be shown both theo- 
retically and experimentally that the scales at which such normalized differential 
entities assume local maxima over scales reflect characteristic scales of local im- 
age patterns and can thus be used for, for example, local size estimation. 



2.2 Likelihood Map in the 1-D Case 

When we construct the feature likelihood map, let us first consider the one- 
dimensional case and take a Gaussian function j(x) = g(x\ xo,to) as a prototype 
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for a blob of size to centered at a;o (see Figure 0(a)). Using the semi-group 
property of the Gaussian kernel, it follows that the scale-space representation of 
/ is L{x\ t) = g{x] xo,t + to), and its 7 -normalized second-order derivative: 



^«(C; t) = t'^‘"Lxx{x-, t) 



t^-{t + to + {x-Xo)^) 
yj2'n{t + to)^ 



(4) 



If we choose 72 = 3/4, then it can be shown (Lindeberg 1998) that Ljj assumes 
a local extremum over space and scale at the point {xo,to) in scale-space that 
corresponds to the position xo and the size to of the original blob /. Thus, 
satisfies some of the required properties of the desired likelihood map A4, how- 
ever, is not invariant to the local amplitude of the signal (see Figure 0b)). 



g (x; t) ( ! d 





Fig. 2. (a): Gaussian kernels of various widths; (b): Evolution over scales of the second 
order normalized derivative operator in the case when 72 = 3/4. 



Quasi- quadrature. A standard approach for amplitude estimation in signal pro- 
cessing is in terms of quadrature filter pairs (h_|_, /i_), from which the amplitude 
can be estimated as Q = {h+ * /)^ -|- (h_ * /)^. Strictly, a quadrature filter pair 
is defined from a Hilbert transform, in such a way that Q is phase-independent. 
Within the framework of scale-space derivatives, the quadrature entity Q for 
first- and second-order derivatives can be approximated by a pair of normalized 
first- and second-order Gaussian derivative operators (Lindeberg 1998): 

QiL = ALl + = AQ-Ll + (5) 

where A is a constant and Lj = t) is the normalized first-order deriva- 

tive operator, where we choose 71 = 1/2 to match 72 = 3/4. Moreover, the value 
of A can be chosen to A « 4/e, such that the response of the Q\L is approx- 
imately constant over space in the neighborhood of (xq; to) (Lindeberg 1998). 
This quadrature entity is, however, not phase-independent over scales. 

Including Stability over Scales. To include the stability of image structures over 
scales (corresponding to low values of derivatives with respect to scale), and 
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to also increase approximate phase invariance with respect to space and scale 
simultaneously, we propose to include the derivative of Ljj with respect to ef- 
fective scale T = logt. Using dr =tdt and the fact that all Gaussian derivatives 
satisfy the diffusion equation dtiL^a) = 1/2 dxx{Lx°‘), it follows that: 

^73 + 1 

t) = t) = Lxx+t'^'' Lxxt = Lxx H (®) 

By adding this expression to ( 0 , we thus propose to extend Q\L into 

Q 2 L = AL\ + + L\^. (7) 

FiguresOJa) and (b) illustrate the evolution of the components in this expression, 
i.e. Lj, and over space and scale. As can be seen, the responses of 
and complement the response of by assuming high values where 
is low and vice versa. Thus, one can expect that by an appropriate choice of 
the weights A and B, Q 2 L will approximately be constant in a neighborhood of 
(xo,to)- Such a behavior is illustrated in Figures Efc) and (d). 




a Gaussian blob centered at a;o = 0 and with variance to = 4; (c)-(d); Evolution of 
and Q 2 L when using the parameter values A = 1 and B — 2.8. Note that Q 2 L is 
approximately constant over space and scale in the neighborhood of (a:o,to). 



Invariance Properties. If we consider the ratio L'^^/Q 2 L, it is apparent that 
the amplitude cancels between the numerator and the denominator. Thus, we 
achieve local contrast invariance. Moreover, since Q 2 L > > 0, it follows that 
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the ratio Q 2 L will always be in the range [0, 1]. Scale invariance of Q 2 holds 
if we for 72 = 3/4 take 71 = 1/2. Moreover, it can be shown that for a Gaussian 
blob, the scale-space maximum of Q 2 L is assumed at to if and only if 73 = 1. 
The relative magnitudes of and Q 2 L are illustrated in Figures |3^c) and (d). 

To conclude, the ratio T|^/ Q 2 L satisfies all the stated requirements on the 
feature likelihood map, and we define 

r2 r2 

M - _iiL - zM rsi 

Q 2 L AL\ + BL\^^ + L\^- 

Determination of the Free Parameters A and B. Concerning the choice of A 
and B, it can be verified that A 1 and i? « 3 give an approximately constant 
behavior of the denominator of M around {xq; to). This was the original design 
criterion when the quasi quadrature entity 0 was proposed. Figure 0a) shows 
the behavior of A4 in this case. Notably, the peak around (xq; to) is rather wide 
in the scale direction, and there are two quite strong side lobes in the spatial 
direction. For the purpose of dense scale selection with application to recognition, 
it is desirable to have a more narrow and localized response with respect to scale 
and space. For this reason, we increase the parameters to A = 10 and B = 100 
and obtain a desired behavior of A4 as illustrated in Figure 0b). 



9^(x,t) IA=1; B=2.8j 



M(x,t) !A=10; B=100! 





Fig. 4. Evolution of the likelihood map M over space and scale for the different values 
of parameters A and B when applied to a Gaussian blob (xo = 0,to = 4)- 



2.3 Likelihood Map in the 2-D Case 

The likelihood map defined in 0 can be easily extended to two dimensions. 
Consider again a Gaussian kernel / = g{x,y; xo,yoAo) as a prototype image 
blob of size to centered at (xq, yo)- The scale-space representation of this signal is 
given by L(x, y; t) = y(x, y; xq, yo, t + fo) and the normalized Laplacian operator 



^ norm^ — Tjj + Lxx{x, y, t) -|- Lyy{x, y, t) 



(9) 
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assumes a local extremum at (xo,yo,to) if 72 = 1. To construct a quadrature 
entity Q, we consider the gradient magnitude (with 71 = 1) 



\VnormL\ = ^ L\ + 

as the analogue to Lj in the one-dimensional case, and take 

^73+1 

norm L) = 

“t“ LjjrjT — 73^^ {Lxx “t“ ^yy) T 2 {^xxxx “t“ ^yyyy T 
as the analogue to Then, we define the feature likelihood map as 



( 10 ) 



xxyy 



) 



^ 

+ (Tjj + Lriri)'^ 

Clearly, Ml is rotationally invariant, and invariant with respect to scale and 
local contrast; it assumes values in the range [0, 1] and for a Gaussian blob the 
maximum value 1 is assumed at {xo,yo] to). Hence, Ml has essentially similar 
properties as the likelihood map (0 in the one-dimensional case. Figures0(a)-(c) 
illustrate how, with H = 10 and B — 100, Ml assumes a rather sharp maximum 
at {xo,yo,to) and rapidly decreases with deviations from this point. 






t = 0.5tQ 




tMx(x,y) 




iMj.(x,y) 



t — 4t() 




U) 



(b) 



(c) 



Fig. 5. Evolution of the likelihood map M l over space and scale for a two-dimensional 
Gaussian blob defined by (a;o = 0,yo = 0,to = 1). Plots in (a),(b) and (c) illustrate 
Ml for scale values t = 0.5, 1 and 4. 



Suppression of Saddle Regions and Noise. Besides blobs and ridges, however, 
M L will also respond to certain saddle points. This occurs when N normL = 0 
and driy^orm^) = 0- To suppress such points, introduce a saddle suppression 
factor 

_ Af -I- A| -I- 2A1A2 _ -I- 

Af -I- A| -I- 2IA1A2I -I- -I- 2 L|^ -I- - L|^| ’ 

where Ai and A2 denote the eigenvalues of the Hessian matrix. Then, it can 
be seen that /i is equal to one when Ai and A2 have the same sign (i.e., for 
emphasized blob and ridge structures), while y decreases towards zero if Ai and 
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A 2 have equal magnitude and opposite sign. Moreover, to suppress the influence 
of spurious noise structures of amplitude lower than £jv, we introduce a small 
normalising parameter sn in the denominator of the expression for the FLM. 
Thus, we define a saddle- and noise-suppressed feature likelihood map as 



ML = y!^ML = 






A{L^ 



5 ^ L2) -f + 

LririrY + + ^ 7777 )^ + £ 



(13) 



Examples of other feature likelihood maps, with exclusive emphasis on specific 
types of image structures are presented in (Lindeberg 2001). 



2.4 Experiments on Synthetic and Real Data 

FigureElshows the result of computing this feature likelihood map for a synthetic 
image with three Gaussian blobs. As can be observed, the high values of At are 
well localized in space and scale, and the peaks over space and scale correspond 
to the positions and the sizes of the original blobs. Figure El shows the result of 
computing At for an image of a hand. Here, it can be seen that At responds not 
only to circular structures but also to elongated ridge-like structures, such as 
Angers. The reason for this is that the Laplacian operator, besides responding to 
circular blob-like structures, also gives a reasonably high response to elongated 
structures. From these results it can be clearly seen how At separates between 
small structures in the background, the Angers and the palm of a hand. More- 
over, despite the varying contrast of the image structures. At gives equally high 
response to weak ridges in the background and to the Angers of higher contrast. 
In many cases, this is a desirable property of a recognition system aimed at 
classifying local image structures irrespective of illumination variations. 



3 Hand Tracking and Recognition 

To experimentally investigate the proposed direct approach for evaluation of 
feature hypotheses, we will in this section present an application of the feature 
likelihood map in combination with particle Altering for simultaneous tracking 
and recognition of hands in image sequences. By necessity the presentation is 
heavily condensed; more details can be found in (Laptev and Lindeberg 2001). 

3.1 Hand Model 

An image of a hand can be expected to give rise to blob and ridge features cor- 
responding the Angers and the palm of a hand. These image structures together 
with information about their relative orientation, position and scale can be used 
for deflning a simple but discriminative, view-based model of a hand. Thus, we 
represent a hand by a set of blob and ridge features as illustrated in Figure 0 
and deflne diAerent hand states, depending on the number of open Angers. 

To model translations, rotations and scalings of hands, we deflne a parameter 
vector X = (x,y, s,a,l) which describes the global position {x,y), the size s 




t=128 t=256 t=512 



Fig. 6. The result of computing feature likelihood map on a synthetic image and a 
real image, where in the second case, the response of the FLM has been set to zero for 
points with > 0, in order to enhance the response to bright image structures. From 
the first image, it can be verified that the FLM gives a correct localization of blobs in 
space and scale. In the second image, it can be seen that the FLM clearly separates 
different image structures, such as the fingers and the palm of a hand, according to 
their size. 
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and the orientation a of a hand in the image, together with its discrete state 
I = 1. . . 5. The vector X uniquely identifies the hand configuration in the image 
and estimation of X from image sequences corresponds to simultaneous hand 
tracking and recognition. 




Fig. 7. Feature-based hand models in different states. The circles and ellipses corre- 
spond to blob and ridge features. When aligning models to images, the features are 
translated, scaled and rotated according to the parameter vector X. 



3.2 Model Evaluation 

Given a feature-based object model, the feature likelihood map provides a direct 
way to evaluate the model on image data. To obtain the likelihood that a model 
configuration X gives rise to an image /, one can simply multiply the likelihood 
values for model features which are directly available from Xir- Hence, we define 
the likelihood p for a model hypothesis X and an image I as 

n 

p{I\X) = (14) 

i=l 

where A4 l is computed on the image /, Xi, yi and U denote the position and the 
size of the feature in the model, while £ G (1>0) accounts for a maximal ad- 
missible matching error and enables for comparison of models with the different 
number of features n {N = maxjijij)). In addition, this likelihood is multiplied 
by a prior on skin colour computed from colour histograms of human hands. 

Notably, the described evaluation does not involve any search and is simple 
and efficient to compute. Therefore it is highly useful for real-time applications. 

3.3 Tracking and Recognition 

To detect, recognize and track hands in image sequences, we search for a hand 
configuration defined by a parameter vector Xk that maximizes the posterior 
distribution p{Xk\Ik) on a given image Ik at a time moment k. Using Bayes 
rule, the posterior can be estimated by 

p{Xk\h) = hp{h\Xk)p{Xk\h-i) (15) 

where p{Ik\Xk) is the likelihood of Xk given Ik, p{Xk\Ik-i) is the prior distribu- 
tion of Xk derived from a previous time step and ft, is a normalization constant. 
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Since the likelihood distribution above has no closed-form expression, the desired 
posterior must be approximated. For this reason, we apply particle filtering to 
estimate and approximate the posterior by a set of N samples (here N « 1000) 
distributed in a parameter space (see (Isard and Blake 1996) for an introduc- 
tion). Given the posterior p{Xk\Ik), we compute its mean Xk^mean and consider 
it as the estimate of a hand pose at time moment k. 

Particle filters spend most of their time on evaluating the likelihood of model 
hypotheses (samples) . As described in the previous section, the proposed feature 
likelihood map is highly efficient for this purpose and we use it for evaluating 
the likelihood of samples within the framework of particle filtering. The efficient 
evaluation enables recognition and tracking to be done in real-time (currently 
at the frame rate 5-10 Hz). Figure El illustrates the result of combined tracking 
and recognition using the described framework. 




tracking of a rescaling and translating hand 



simultaneous tracking and recognition of hand poses 




Fig. 8. Results of combined hand tracking and pose recognition using particle filtering 
and evaluation of feature-based hand models on feature likelihood maps. 



3.4 Implementation Details 

In practice, the abovementioned scheme has been implemented in a pyramid 
framework using a fixed set of scale levels. The resolution at scale level ti was 
obtained by sub-sampling the original image with a factor Hi = yUjif, and 
the derivatives have been computed using filter kernels of fixed scale tf. In the 
experiments, we found tf « 2.0 to be sufficiently large for obtaining a satisfactory 
quality of M on one hand, while on the other hand being sufficiently small to 
enable fast computations. On a modest 550 MHz Pentium HI processor our 
current implementation (without extensive optimization) requires about 0.1 s to 
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compute the feature likelihood map on a 100 x 100 image and about 0.04 s to 
perform the particle filtering using 1000 hypotheses. 



4 Related Work 

The subject of this paper relates to multi-scale approaches for image represen- 
tation, computation of differential invariants, detection of image features as well 
as tracking and recognition of view-based object models. Because of the scope 
of these areas, it is not possible to given an extensive review, and only a few 
closely related works will be mentioned. Crowley and Sanderson (1987) consid- 
ered a graph-like image representation containing links between blobs at different 
scales. Pizer et al. (1994) proposed the use of multi-scale medial-axis represen- 
tations computed directly from image patterns distributions. Multi-scale image 
differential invariants (Koenderink and van Doom 1992, Lindeberg 1994, Flo- 
rack 1997) have been computed by several authors, including Schmid and Mohr 
(1997) who apply such descriptors at interest points for image indexing and re- 
trieval. Explicit scale selection for extraction of multi-scale image features has 
been investigated by Lindeberg (1998). A similar approach by Shokoufandeh et 
al. (1999) extracts extrema in a wavelet transform. Lindeberg (1998), Chomat et 
al. (2000) and Almansa and Lindeberg (2000) have computed dense descriptors 
for estimating the characteristic scale at any image point. With respect to ob- 
ject tracking, Isard and Blake (1996) developed a particle filtering approach for 
tracking contour-based models. Black and Jepson (1998) used eigenspace models 
of gray-value patterns for tracking deformable models. The approach by Bret- 
zner and Lindeberg (1999) is closer to ours and applies a hierarchy of multi-scale 
features for representing and tracking hands. 

5 Summary and Future Work 

In this paper, we have presented a new approach for probabilistic and dense 
image representation by feature likelihood maps. Such maps are invariant to the 
amplitude of patterns and emphasize local structures in images by assuming high 
values at certain points in feature space. We derived the feature likelihood map 
for symmetric blob-like image structures and analyzed its behavior on synthetic 
and real images. Using the dense structure of the feature likelihood map, we have 
shown how it can be applied for direct and efficient evaluation of feature-based 
object hypotheses. Based on this evaluation procedure, we developed a particle 
filtering approach for recognizing and tracking hands in image sequences. 

By analogy with the developed likelihood map for symmetric blob-like struc- 
tures, similar maps can be constructed for other types of local image structures. 
For this purpose, the expression in (HU must be redefined by substituting the 
normalized Laplacian operator and its quadrature by other differential entities 
emphasizing the desired image properties. Examples of other feature likelihood 
maps constructed in this way are presented in (Laptev and Lindeberg 2001), as 
well as ways to incorporate colour information into this framework. 
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Another interesting direction of future research concerns the extension of 
feature likelihood maps to spatio-temporal domain. Here, the general ideas of 
this presentation could be combined with the concept of normalized derivatives 
in spatio-temporal scale-space. The resulting maps could then be used in order 
to analyze, capture and recognize temporal events in image sequences. 
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Abstract. We propose a variational model which permits to simultaneously de- 
hlur and oversample an image. Indeed, after some recalls on an existing varia- 
tional model for image oversampling, we show how to modify it in order to prop- 
erly achieve our two goals. We discuss the modification both under a theoretical 
point of view (the analysis of the preservation of some structural elements) and 
the practical point of view of experimental results. We also describe the algorithm 
used to compute a solution to this model. 



1 Introduction 

This paper deals with a variational methods whose aim is to both deblur and oversample 
an image. More precisely, for iV C IN, noting T n the torus of size N (the periodization 
of [0, Atp), we expect to recover an image v € L^(Tjv), from a data u € , such 

that 

Um,n = (s* v){m, n) + bm.n 

where s G L^(T n), (w, n) G {0, ..., N — 1}^ and b G is a Gaussian noise. Note 
that here the convolution is made between two functions of Tat. For commodity, in the 
following, we will denote by IN at the periodization of {0, N — 1}. 

A very useful framework for this kind of problem is the Fourier domain. We define 
the Fourier transform of a function v G L^(Tjv) by 

v[x,y)e ^ dxdy , 

Jtn 

for (^, rj) G 7Z^ . The discrete Fourier transform of m C IR^ is defined by 



N-\ 

m^n—0 






for(e,T7)e{-f + l,...,f}2. 
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Using the Poisson formula (see ITDIl . pp 29), we can express the discrete Fourier 
transform of u in terms of the discrete Fourier transform of b and the Fourier transforms 
of s, V. This gives 

^ X! ^-lr+k,^+i'^-^+k,^+i + h,v^ 
k,l^ZZ 

for any (^, 77) G { — ^ + 1, Therefore, we remark that if the function v satisfies, 

for any (^, ry) G { — y + '^■^+k ^+i = 0, for fc 0 or Z 0, this would be a 

deblurring problem. On the other hand, if we do not take into account the blurring and 
the noise but try to extrapolate the high frequencies, this is an interpolation problem. 
However, we do believe that these two issues cannot be separated and should be treated 
simultaneously. This is what we will propose in what follows. 

There are only a few papers which deal with the possibility to simultaneously de- 
blur and oversample an image. Although there is an extensive literature for both image 
deblurring and oversampling. Concerning image deblurring, the reader can refer to H 
for most of the linear methods, to wni for variational ones (respectively based on a 
regularization with the entropy and the total variation) and to GHIHl for wavelet packet 
based methods. Concerning oversampling, most of the linear methods tend to compute 
or approximate the sine-interpolation (see 1141 fT^ T Non-linear methods often try to 
adapt the filter to the particular behavior of the image (edge, smooth region,...) (see 
El El) or use a regularization approach [0E1- 

The paper is organized as follow , in Sect.|3 we make some recalls on a variational 
oversampling method introduced in 0. Then, in Sect. 0 we show how to adapt this 
model in order to take into account the noise b. We also show that, with regard to the 
analysis of the preservation of a family of structural elements, this model has to be 
modihed. We then explain the numerical scheme which is used to compute a solution 
of this model. Finally, in Sect.0, we present some numerical experiments which confirm 
the importance of the modifications we have proposed in Sect.El 

2 Variational Oversampling of Noise Free Images 

All the results announced in this section are rigorously stated and proven in [0. In 
this paper, we studied the possibility to oversample images by mean of a Maximum A 
Posteriori model. More precisely, we studied a variational oversampling method based 
on the minimization of the total variation. This method consists in finding an image 
w G LF' (Tjv) which 

minimizes / |Vw|, among ic G )Vs,ii , (1) 

Jtn 

where, for any given data u G and any convolution kernel s G (Tat), we dehne 
= {w G (Tat) , V(m, n) G (INat)^, s * w{m, n) = Um,n} ■ 



Total Variation Based Oversampling of Noisy Images 113 



Remark that, for simplicity, we note the total variation |Vw| instead of |Z?w|(T at). 

We know that O has a solution as long as s is such that }Vs,u is not empty and 
all the elements of Ws,u have the same mean. We cannot guaranty the uniqueness of 
this solution. However, we are sure that two different solutions have locally the same 
level lines at locations where these latter are properly dehned and the solutions are C^. 
We also know a discretization of O which permits to properly approximate one of its 
solutions. All these mathematical properties guaranty this problem to be well posed. 

However, a drawback of this model is that if s is too much localized in space do- 
main some points are not enough constrained by the data hdelity term. For instance if 
s = S (the Dirac delta function), the points of (INat)^ are the only one involved in the 
constraint in m and, since (INat)^ is of measure zero in Tat, the solutions of ([U are 
constant functional Therefore, arises the question of knowing whether s sufficiently 
spread the constraint over the whole domain or not. 

We can give an answer to that question by investigating the preservation of structural 
elements that are the “cylindrical functions”. These functions are basically ID functions 
and was hrst introduced to model the ability of an oversampling methods to properly 
restore edges. They are rigorously defined by 

Definition 1. Let u S and (a,/3) G \ {(0,0)}. u is cylindrical along the 
direction (a, (3) if and only if its Discrete Fourier Transform is supported by 



Remark that this proposition can be extended to other kinds of convex regularity 
criterion such that the reformulation of dD with this new regularity criterion has a solu- 
tion. However, the advantage of the total variation is that it allows to reconstruct some 
high frequencies in a more complex manner than just filling fhem wifh 0. The strong 
and thin lines on Fig.QJrepresent the spectrum of a variational oversampling (by Q])) of 
a cylindrical function. Once again, this has already been discussed in |2|| where one can 
hnd lots of experiments on this property. Here, we interpret the possibility to preserve 

* Note that here we cheat a little since when s = i5 all the elements of Ws,u do not have the 
same mean and we are therefore not sure of the existence of a solution. Note also that this is 
coherent with the fact that constant functions (which are the “solutions” of m in this case) do, 
a priori, not belong to Ws,u. 




Given this definition, we can state 

Proposition!. Let N be an integer, {a, /3) G \ {(0,0)}, u G IR^ cylindrical 
along the direction (a, ff). For any kernel s G L^(T n), such that 




0 admits a solution cylindrical along the same direction (a, /?). 
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Fig. 1. The strong line represents the spectral support of a function cylindrical in the 
direction (2, 1). The thin line represents the support of the spectrum of the oversampling 
of this image according to Q. 



these ID structure as a criterion to decide whether the constraint is sufficiently spread 
or not. Indeed, if it is not the case, we obtain artifacts such as the one presented on Fig. 
0and Fig.0where 1 structures are broken. 

We are conscious of the fact that there could be more precise conditions on s in order 
to spread the constraint over T jv (indeed, we do not have any theoretical argument to 
assert that the condition given in PropositionQ]is necessary or optimal). 



3 Variation Oversampling of Noisy Images 

3.1 The Model 

The model of the preceding section does not take into account the corruption of the 
image by noise. This leads us to introduce a model where the constraint is “weaker” 
than in dB. The following model is very close to the one of Rudin-Osher-Fatemi (see 
(Q) for the deblurring issue. However, this time, we do take into account the sampling 
and the aliasing as being part of the degradation process. This modification has already 
been evoked but not fully explored in ||H1| . 
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More precisely an adaptation of Rudin-Osher-Fatemi method which would take into 
account the sampling process could be the minimization, amon^ w S BV (T at), 



'Tn 



|Vw; 



N-l 

+ A E 

m,n—0 



s * w{m, n) — u„ 



( 2 ) 



for a parameter A > 0. 

However, similarly to previously, a drawback of this model is that the data fidelity 
term may not be sufficiently spread over the whole torus (it may be concentrated in the 
vicinity of points with integer coordinates). We can, however, state a result similar to 
Proposition^] for this model which suggests to modify (Q) in such a way that we avoid 
this artifact. 

Proposition 2. Let N be an integer, (a,/3) G IR^ \ {(0,0)}, u G cylindrical 
along the direction (cr, /3). For any kernel s € L^(T n), such that 



S ^ jL 
N ’ N 



^0,/or(^,,7)G{-f + l,...,f}2 
= 0 , otherwise. 



the minimization of Ml admits a solution cylindrical along the same direction (cr, ff). 

The proof of this proposition is very close to the one of Propositionn(see 10). 
Proposition|2j suggests to modify (0 and to minimize 




N-l 

|Vw| + A ^ |s * w{m, n) - Um,n\^ , 

m,n=0 



(3) 



where A is a parameter and s is defined by 

,if (C,??) e {- 



S ^ T? 
W'W 



s e . 
0 



+ 1 -1^ 
T X, ..., 2 / 



, Otherwise. 



In order to understand the consequence of this modification, we express, using Poisson 
formula, the data hdelity term of (0 in frequency domain. We find that 



N-l 



N-l 



2 



\s *w{m,n) - Um,n\'^ 

m,n—0 



E 

i.v=o 



E ^-lf+k,§+l'^-L+k,^+l 



(4) 



Therefore, any change in the repartition of u^^ri over ■^+z)fc 

Sfe leZZ remains unchanged, yields the same values for this 

data hdelity term. Therefore, the minimization of © may spread uj ,, over these co- 
efficient. This yields, in space domain, a result which looks like a sum of functions 
which are almost Dirac delta functions. 

^ One can refer to @ for a definition of Sy(Tjv). It can heuristically be understood as the 
space of the functions w G L^(Tjv), such that IVwl < oo. 



116 Franfois Malgouyres 



On the other hand, a formula similar to 0, for the model m, shows that the data 
fidelity term deals only with low frequencies (the one in { — i + 5 }^). Therefore, 

we are sure to preserve the main structures of the image. The heuristic of this model is 
to consider the aliasing as a noise. 



3.2 Numerical Implementation 

In Sect. El we present some images which are solutions of O) and 0 . These solutions 
are computed using the same and more general algorithm: one which minimizes 0 for 
an arbitrary kernel s. Let us describe this algorithm. 

The first issue in order to minimize 0 is to discretize it. Therefore, we have only 
considered oversampling of level K ^ 2. This means that in practice the result is 
simply an array of size KN x KN. Moreover, we have discretize Vw by a simple 
finite difference scheme and defined the partial derivatives 

and VJ 

m,n — '^m,n • 



Moreover, in order to have a proper descent direction, we have replaced the total 
variation by 

KN-l 

+ {AyWjn,ny , 

ij=0 

for /3 S IR. We call the sum of this term and of the data fidelity term. Note that, 
in practice, we let /3 decrease to 0 during the iteration process. These ideas are now 
classical and are already discussed in II] 0 ]. 

As we said previously, the main difference with the usual minimization of Rudin- 
Osher-Fatemi functional is that the data fidelity term now takes into account the sam- 
pling process. The computation of the data fidelity term and of its gradient are in this 
case simpler in Fourier domain. Therefore, we express it by an adaptation of 0 . 

More precisely, 0 becomes now 



N-l 

\s*w{m,n) -Um,n\'^ 

m,n—0 



N-l 

E 






K-1 

S^+kN,r]+lNW^+kN,r]+lN ~ U^,r] 

fc ,/— 0 



2 



where the hats denote either the discrete Fourier transform of a signal of size N x N or: 
KN X KN. 

Therefore, a simple computation permits to find that the Fourier transform of the 
gradient of this term is, at the frequency + kN, ij + IN), for (^, 77 ) G {0, ..., — 1} 

and (fc, 1) G {0, ..., K — 1}, 



2 S^+kN,r)+lN 



(' V ■ ■ ■ 'l 

I 2^ S^+k'N,r]+l'NW^+k'N,ri+VN ~ I 

\k’,l’=0 ) 



where the bar denotes the complex conjugate. 
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Finally, the dtscretization of the functional (|2li is minimized with a gradient descent 
algorithm with an optimal step. More precisely, we start from the zero-padding (or 
sine interpolation, see |H3) oversampling of the blurred image and call it To get 
from , for j G IN, the gradient of the functional at is calculated, 

as explained above, at each step of the algorithm. Then the optimal amplitude of the 
variation of the image in the direction —WEf 3 {u^) is estimated by the resolution of 

min Eb{u^ — s \7Es{u^)) 

s>0 

using a dichotomy method. Once the optimal amplitude sq is calculated, we let 

yj+i = WEfsiu^) . 



We then iterate this process. 

Note that in order to increase the speed of this algorithm, we can start from a de- 
blurred version of instead of it°. Moreover, it is better to start with a large /3 and to 
let it decrease to 0. 

4 Experiments 

All the images presented here come from manipulations (degradations and reconstruc- 
tions) of the image displayed on Fig. 0 Moreover, for simplicity of display, all the 
experiments deal with downsampling and oversampling of factor 2. 

4.1 The Noise Free Case 

We present in this section some experiments which show to evidence the relevance of 
our interpretation of Propositions when there is no noise. Remark that in such a case, 
the choice of the parameter A in 0 and 0 is arbitrary as long as A is sufficiently large 
and the sampled image does not contain too much aliasing. 

On Fig. 0 we display some extracted part of 

- Up-Left: the reference image. 

- Up-Right: the downsampled image (with s = (5) of the reference image (without 
noise). 

- Down-Left: an oversampling of the downsampled image by mean of 0 , with s = S 
and A = 10. Since in 0 we took s = S, we clearly see on this image the points 
of the grid which are constrained. Note that this drawback is still present for all the 
values of A we have tested. 

- Down-Right: an oversampling of the downsampled image by mean of 0, with 
A = 10. Note that the points which were visible on the previous image are no 
longer present on this image. 

On Fig. 0 we display 
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Fig. 2. Reference Image. This is the image used in all the experiments of SectionEJ 




Fig. 3. Up-Left: The initial image. Up-Right: The downsampling of the initial image 
with s = d. Down-Left: Oversampling by minimizing 0 . Down-Right: Oversampling 
by minimizing ©. 
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Fig. 4. Up-Left: The initial image. Up-Right: The downsampling of the initial image 
with s = l|[_i i] 2 - Down-Left: Oversampling hy minimizing Down-Right: Over- 
sampling by minimizing © 



- Up-Left: the reference image. 

- Up-Right: the downsampled image (with s = l|[_i i] 2 ) of the reference image 
(without noise). 

- Down-Left: an oversampling of the downsampled image by mean of 0, with s = 
l|[_i i ]2 and A = 10. The texture is distorted. Note that the only way to avoid 
this distortion is to remove the texture. We experimentally find that this occurs for 
A ^0.1. 

- Down-Right: an oversampling of the downsampled image by mean of m , with A = 
10. This time the texture is preserved. This illustrates the interest of Proposition El 
and of the modification introduced in 0. 

This latter experiment shows that even when the sequence (s * w{m, 

pends on all the valuefl of w{x, y), the modification suggested by Proposition^ still 

permits to improve the method. 



^ More precisely, there does not exist any open set 17 such that, for any arbitrary modification of 
w on 17, (s * w{m, n))(^ n)ei'j2 remains unchanged 
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4.2 The Noisy Case 

Let us now investigate the possibility to remove a noise while oversampling and deblur- 
ring the image. 




Fig. 5. Up-Left: The initial image. Up-Right: The downsampling of the initial image 
with s = l|[_i i]2 plus a Gaussian noise of standard deviation 3. Down-Left: Over- 
sampling by minimizing ©. Down-Right: Oversampling by minimizing (Oil. 



In order to test our algorithm, we do the same experiment as for the creation of Fig. 
0(the one with s = l|[_i ijs) except that we add a Gaussian noise of standard deviation 
3. We display the results of this experiment on Fig.EI For both reconstructed images 
the parameter A is hxed in such a way that the amount of remaining noise is reasonable 
in an homogeneous region (see Fig.^where a larger part of the reconstruction by mean 
of (01 is displayed). We take A = 0.5 for the oversampling by mean of (0 and A = 0.3 
for the one which uses ( 0 ). Note that the fact that for a comparable amount of noise we 
need a smaller A with model O) is not surprising, since the data fidelity term of © does 
more constrain the image than the one of 0). The main comment on these images is 
that, despite the noise, our interpretation of Proposition^ still makes sense. 

We also compare our results to the one obtained by a simple combination of linear 
algorithms. Once again, we compare them for the downsampling of the image displayed 
on Fig. 12 with s = l|[_i i ]2 and a Gaussian noise of standard deviation 3. Therefore, 
we display on Fig. 0 on left, an oversampling obtained by composing a wiener filter 
(see 01) applied to the sampled image, to deblur and denoise it, and a sine-interpolation 
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Fig. 6. Left: Oversampling with a linear filter (wiener filter + sine-interpolation) (Up: 
The image, Down: Its spectrum). Right: Oversampling hy mean of (Up: The image, 
Down: Its spectrum). 



(see Cl), in order to oversample it (Up: the result, Down: its spectrum). We tried to fix 
the parameter of the wiener hlter (the assumed standard deviation of the noise) in order 
to have the same amount of noise on this image as on the oversampling hy mean of 0 . 
However, it is not possible to hnd such a value for the parameter without removing most 
of the informations contained in the image. Therefore, the left images displayed on Fig. 
^correspond to a value of cr = 20 and still contain a signiheant amount of noise. 

The images on the right hand side of Fig. ^correspond to the result and its spectrum 
when minimizing O) for a parameter A = 0.3. We clearly see here that © permits to 
obtain a result which contains less noise and is sharper than with the previous method. 
This is also visible on the spectrum of these images. In the hrst case, we just hll in the 
high frequencies with zero and with m , we rebuild a realistic spectrum out of the initial 
spectral domain. 
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Abstract. The most popular lossy image compression method used on the Inter- 
net is the JPEG standard. JPEG’s good compression performance and low compu- 
tational and memory complexity make it an attractive method for natural image 
compression. Nevertheless, as we go to low bit rates that imply lower quality, 

JPEG introduces disturbing artifacts. It appears that at low bit rates a down- 
scaled image when JPEG compressed visually beats the high resolution image 
compressed via JPEG to be represented with the same number of bits. 

Motivated by this idea, we show how down-sampling an image to a low resolu- 
tion, then using JPEG at the lower resolution, and subsequently interpolating the 
result to the original resolution can improve the overall PSNR performance of the 
compression process. We give an analytical model and a numerical analysis of the 
sub-sampling, compression and re-scaling process, that makes explicit the possi- 
ble quality/compression trade-offs. We show that the image auto-correlation can 
provide good estimates for establishing the down-sampling factor that achieves 
optimal performance. Given a specific budget of bits, we determine the down 
sampling factor necessary to get the best possible recovered image in terms of 
PSNR. 

1 Introduction 

The most popular lossy image compression method used on the Internet is the JPEG 
standard. JPEG uses the Discrete Cosine Transform (DCT) on image blocks of size 
8x8 pixels. The fact that the JPEG operates on small blocks is motivated by the non- 
stationarity of the image, and the need to approximate the Karhunen Loeve Transform 
(KLT) for 2D Markov processes. A quality measure determines the (uniform) quantiza- 
tion steps for each of the 64 DCT coefficients. The quantized coefficients of each block 
are then zigzag-scanned into one vector that goes through a run-length coding of the 
zero sequences, thereby clustering long insignificant low energy coefficients into short 
and compact descriptors. Finally, the run-length sequence is fed to an entropy coder, 
that can be a Huffman coding algorithm with either a known dictionary or a dictio- 
nary extracted from the specific statistics of the given image. A different alternative 
supported by the standard is arithmetic coding. 

JPEG’s good compression performance and low computational and memory com- 
plexity make it an attractive method for natural image compression. Nevertheless, as 
we go to low bit rates that imply lower quality, the JPEG compression algorithm intro- 
duces disturbing artifacts. It appears that at low bit rates a down-scaled image when 
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nology, Technion City, Haifa 32000, Israel. 
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JPEG compressed and later interpolated, visually beats the high resolution image com- 
pressed directly via JPEG using the same number of bits. An experimental result dis- 
played in Figure[I] shows that both visually and in terms of the Mean Square Error (or 
PSNR), one obtains better results using down-scaling compression and up-scaling after 
the decompression. 





Fig. 1. Original image (on the left) JPEG compressed-decompressed image (middle), 
and down-scaled-JPEG compressed-decompressed and up scaled image (right). The 
down scaling factor - 0.5. In both cases, the compression ratio is 40. The MSE in the 
upper row is 219.5 (left) and 193.12 (right). Similarly, in the lower row: 256.04 (left) 
and 248.42 (right). 



In this paper we propose an analytical explanation to the above phenomenon, along 
with a practical algorithm to automatically choose the optimal scaling factor for best 
PSNR. We derive an analytical model of the compression- decompression reconstruc- 
tion error as a function of the memory (bits) budget, the (statistical) characteristics of 
the image, and the scale factor. We show that a simplistic second order statistical model 
provides a good estimate of the down-sampling factor that achieves optimal perfor- 
mance. 

This report is organized as follows. Sections QBBlprssent the analytic model, and 
explore its implications. Section 0 describes an experimental setup that validates the 
proposed model and its applicability for choosing best scaling factor for a given im- 
age with a given bits budget. Finally, Section 0 ends the paper with some concluding 
remarks. 
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2 Analysis of a Continuous “JPEG-Style” Image Representation 
Model 

In this section we start building a theoretical model for analyzing the expected recon- 
struction error when doing compression-decompression as a function of the total bits 
budget, the characteristics of the image, and the scale factor. Our model considers the 
image over a continuous domain rather then a discrete one, in order to simplify the 
derivation. The steps we follow are: 

- Derivation of the expected compression-decompression error for a general image 
representation process, based on slicing the image domain into M hy N blocks. 

- Derivation of an expression for the error by exploiting the fact that the coding is 
done in the transform domain using an orthonormal basis, and assuming that the 
error is due only to truncating the transform coefficients. 

- Extension of the expression for the error to include quantization error of the non- 
truncated coefficients. 

- Extension of the formal error to take into account the fact that the transform is the 
DCT, i.e. the orthonormal basis is cosine functions. 

- Including in the expression for the error an approximation of the quantization errors 
due to various policies of allocation the total bits budget. 

At the end of this process we obtain an expression for the error as a function of the bits 
budget, scale factor, and the image characteristics. This function eventually allows us 
to determine the optimal scale-down factor in JPEG-like image coding. 

2.1 Compression-Decompression Expected Error 

Assume we are given images on the unit square [0, 1] x [0,l],/„(x,y) : [0,l]x[0,l]^ 
R, realizations of a 2D random process {f^{x, y)}, with second order statistics given 
by 



E{fw{x,y)) = 0, 



TZ{x,y,x + Tx,y + Ty)=rle 



We assume that the image domain [0, 1] x [0, 1] is sliced into M ■ N regions of the form 



Ay _ 



i — 1 i 


\y 


'j - 1 J ’ 


_ M ' M_ 


A 


_ N 'N_ 



for i=l,2, =1,2,.., N, 



Assume that due to our coding of the original image fw{x, y) we obtain the com- 
pressed-decompressed result {x, y), which is an approximation of the original image. 
We can measure the error in approximating fw{x, y) by fw{x, y) as follows 



= 



[0,l]x[0,l] 
M N 



{fw{x,y) - fw{x,y)f'dxdy 



= EE 

*=i i=i 



ifw{x,y) - fw{x,y)f‘dxdy 
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M N 



= ^^Area(Z\y) 



1 



i=i i=i 
M N 

= EE 

i=l j=l 



Area(Z\ij)i/ ^ 



ifw{x,y) - fw{x,y)Ydxdy 



1 

M~/V 






where we define MSEf^{A,j) = Area^(A, “ fwi.x,y)fdxdy. We 

shall, of course, be interested in the expected mean square error of the digitization, i.e., 



M N 






tE 



i=i j=i 



M-N I Area(Ziy) 



ifw{x,y) - fw{x,y)f‘dxdy 



Note that the assumed wide-sense stationarity of the image process results in the 
fact ih?XE{MSEf{Aijy) is independent of (f, j), i.e., we have the same expected mean 
square error over each slice of the image. Thus we can write 



E{£l) = M-N 



= E 



M-N 

1 



E{MSEf^{Ai^)) 



Area(Z\ii)i/ 



ifw{x,y) - fw{x,y)Ydxdy 



Up to now we considered the quality measure to evaluate the approximation of 
fw{x, y) in the digitization process. We shall next consider the set of basis functions 
needed for representing fw{x, y) over each slice. 



2.2 Bases for Representing f^{x,y) over Slices 



In order to represent the image over each slice Aij, we have to choose an orthonormal 
basis of functions. Denote this basis by {(!>ki{x, y)}fe,;=o,i, 2 ,...}. We must have 



^ki‘l‘k'i'dxdy = Skk'Sw 



1 if (k,l) = 
0 otherwise. 



If {<^ki} is indeed a basis, then we can write fw{x,y) = Y.V=QY.'^o{fw{x,y), 
<Pki{x,y))d>ki{x,y), as a representation of f^{x,y) over Aij in terms of an infinite 
set of coefficients 



Fki = ifw{x,y),<Pki{x,y)) = 



fw{x,y)<Pki{x,y)dxdy. 



Suppose now that we approximate fw{x, y) over Aij by using only a finite set 17 of 
the orthonormal functions {(Pki {x, y)}, i.e consider 



fw{x,y) = '^'^^^^^^^{U{x,y),<Pki{x,y))<Pki{x,y), 
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(It is easy to see that the optimal coefficients in the approximation above turn out to be 
the corresponding Fki ’s from the infinite representation!). The mean square error of 
this approximation, over An say, will be 



MSEf^(Aii) = M-N 
-2 



/ fl{x,y)dxdy 
^11 

fw{x,y)fw{x,y)dxdy 



All 



Jl{x,y)dxdy 



Hence, 

MSEf^{An) 
= M-N 



jj ^ y)dxdy - ^ y)? 



Now the expected MSEf^ (^ii) will be: 

E(MSEfJAn)) 

= M-N JJ ^ EfJ-{x,y)dxdy-^^^^^^^^E{fn,{x,y),(t>k,i{,x,y)f 

Hence EEl = M ■ N ■ rl ■ - M ■ N ■ Y. - M ■ N 

Z) J2(k,i)eoF[Fk,i]- 



2.3 The Effect of Quantization of the Expansion Coefficient E^i 

Suppose that in the approximation fh(x, y) = Y J2{k,i)eoFki'Pk,i{x, y) we can only 
use a finite number of bits in representing the coefficients that take values in R. 
If Fki is represented / encoded with bki -bits we shall be able to describe it via that 
takes on 2*''=' values only, i.e. = Qi,,., (Fki) : F — > set of 2*''=' representation levels. 
The error in representing Fki in this way is F^; = {Fki — Let us now see how 

the quantization errors affect the MSEf^ (Lin). We have 

MSEf^{Aii) = [[ (fw{x,y) - f2{x,y)\ dxdy 

M ' N dJ All ' ' 

where = EEk,ieoFj^i<l^ki{x,y). Now 

= MSEfY^ii)+M-N ■ 

The expected MSE^ (^ii) is therefore given by: 

F(MFF^^)(Z\ii) 
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= ri- M ■ N 



s ^ ^ "El:,. 



{k,i)eO 



Hence, in order to evaluate E[S^]'^ in a particular representation when the image is 
sliced into M ■ N pieces and over each piece we use a subset 17 of the possible basis 
functions (i.e. 17 C {{k, l)\k, ? = 0, 1, 2..}) and we quantize the coefficients with B^i- 
bits we have to evaluate 



(k,i)en 

(k,l)GO 



{ variance of F^i } 



+M ■ N ■ EE... error in quantizing F^i} 



2.4 An Important Particular Case: Markov Process with Separable Cosine 
Bases 



We have the statistics of {fuj{x, y)} given by 

E{fw{x,y)) = 0, Ff^{x,y)f^{x + T^,y + Ty) = 

and we choose a separable cosine basis for the slices, i.e. over [0, -^] x [0, E],'l>ki{x,y) 
= ipk{x)ipi{y) where ipk{x) = a/M(2 — 6k) cos kMnx, k = 0,1, 2, ..., and (fi{x) = 
^JN {2 — Si) cos I N TTx, I = 0, 1,2, .... To compute EE^ for this case we need to 
evaluate the variances of Fki defined as Fki = /a. f^{x, y)(fik{x)(pi {y)dxdy, we have 



= E 



fw{x,y)f^{^,r])ipk{x)(pi{y)ipk{0‘Pi{v)dxdyd^dri 



[[ [[ XqC ■ M{2 — 5k)cos{knMx)cos{knM^) 

JJ Ziii 

■N{2 — Si) cos(lTrNy) cos{lTTNy)dxdyd^drj. 



Therefore, by separating the integrations we obtain 



EFli = rl (2 - 5k) 



JO Jo 

■{2 -Si) 



e 

w rw 



ax\x {| cos(A:7rMa;) cos{kTTM^)Mdxd^ 
-ay\v-ri\ cos(?7rlVy) cos{lTrNr])Ndydr] 



10 Jo 



Changing variables of integration to x = Mx ie[0,l],^ = M^ ^G[0,l],y = 
Ny y G [0, 1], and fj = Nrj t) G [0, 1] yields 



EFli =rl{2- 5k){2 - 5i) 



' 0 
1 

'tv 



M 






cos(/c7ri) cos{kn^)dxd^ 



1 /•! 



0 Jo 






cos(TTry) cos{h:fj)dydfi 



0 .10 



pl pi 

Let us define, for compactness, the following integral: Jg e 

cos(l7T^)dxd^ = A4(A; k, 1) Then we see that 






cos{kTTx) 



EFli = rl{2 - 5^) ■ ^ k, k) ■ {2 - S?) ■ 1 1, 1) 
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We have that 

M{A;k,l) = i^ + SkOVl)S(i:-l\ ^2 

I . (2-e-^[(-if + (-iyp 

^ ^ (A2 + (Z7t)2)(A2 + (A:^)2) 2 



Therefore some algebra yields 

(■#) 



EF^i = 



Arl 



M-N[{^Y + k^n 

(^) 












D\n ! ( m ) 



(f-)2 + F^2 






(^)2+;2^2 



2.5 Incorporating the Effect of Coefficient Quantization 

We have that £^[-Ffc/ — ~ F ■ ^ ; where /C is a constant in the range [1,3]. 

According to rate-distortion theory (for uniform and Gaussian variables) the above for- 
mula for evaluating the error due to quantization describes well the behavior of the error 
as a function of the number of bits allocated for representing F^i ■ 

Putting the above results together, we get that the expected mean square error in 
representing images from the process {f^{x, y)} with Markov statistics, by slicing the 
image plane into M ■ N slices and using, over each slice, a cosine basis is given by: 



f[sSY 

= --S {l - - Sf) |1 



K. 

22bfci 



This expression gives E[EYY terms of rg, {ax, ay} and {bki} - the bits allocated to 
the coefficients Tfc/where the subset of the coefficient is given via 17. 



3 The Slicing and Bit-Allocation Optimization Problems 



Suppose we consider 

El£S]'‘ 

= ’■i {l - i; k. k)& - l.l) |1 




as a function of M, {bki}- We have that the total bit usage in representing the image 
is 



Btot= 



Now we can solve a variety of bit-allocation and slicing optimization problems. 
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3.1 Optimal Local Bit Allocation and Slicing Given Total Bit Usage 

Given the constraint 

\ ^ \ ^ , _ Bjuy 

~ M-N’ 

find {6^;} that minimize the E£^. We need to minimize 



E - SP)M{^; 1, 0/C2-2*'-. 



This is a classical bit allocation process and we have that the optimal bit allocation 
yields (theoretically) the same error for all terms in 



si:. 



'kl 



-EE 



^kl 

k,ieoAli 



where we defined Aki as the number of quantization levels, see 0. Hence we need 



^kl 



= Const ^ A^i = 



'kl 



Const 



and we should have 2 ^S S Hi ^ = 22 Stot/(mat) j 



IS 



bki = 2 log2 + 



2 -BtOT 1 

M-TV ' |G| 2 



^-^iog 2 ( n 

(kl)Ga 



With this optimal bit allocation the expression X) S(fc i)eO 2 ^Ht minimized to 



|G|.Const=|f2|.2^ J] 



'kl 



Aki)en 



Hence, 



E{[£S?)opt = ^0 {l - E Efc,,^(2 - )(2 - SF)mQ; k, k)M{^; 1, 1) 



H-|i7|2 



PTQT . 1 



(k,i)en 



an error expression in terms of {Btot, M, N, 17) and the second-order-statistics pa- 
rameters tq, ax, ay of the {/u,(a;, ?/)}-process. 
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3.2 Effect of Slicing with Rigid Relative Bit Allocation 



An alternative bit allocation strategy perhaps more in the spirit of the classical JPEG 
standard can also be thought of. Consider that f2 is chosen and the bki ’s are also chosen 
a-priori for all (fc, 1) G J7. Then we have 



= ’■1 { 1 - E k. 



1C 



as a function of M and N. This function clearly decreases with increasing M and 
N since more and more bits are allocated to the image, and here Btot = M ■ N ■ 
S Sfc Suppose now that for M = N = 1, we choose a certain bit allocation 

for a given f2 (say 17 = {(fc, l)\k + I < Limit ,k,l = 0, 1, 2...}) i. e. we chose bki 
but now as we increase the number of slices (i.e. increase M and N) we shall modify 
the 5fcj’s to keep Btot a constant by choosing bki{M, N) = bki • jjrjq- Here Btot 
remains a constant and we can again analyze the behavior of E[£^]'^ as M and N vary. 



3.3 Soft Bit Allocation with Cost Functions for Error and Bit Usage 

We could also consider cost functions of the form Cmse{E[S^Y) + Cb{M ■ N ■ 
Bits/slice) , where Cmse and Cb are cost functions chosen according to the task in 
hand, and ask for the bit allocation that minimize the joint functionals, in the spirit of 

151 . 



4 The Theoretical Predictions of the Model 

In the previous sections we proposed a model for the compression error as a function of 
the image statistics (ro,ax, cxy), the given total bits budget Btot, and the number of 
slicings M and N. Here, we fix these parameters according to the behaviour of natural 
images and typical compression setups and study the behaviour of the theoretical model. 

Assume we have a gray scale image of size 512 x 512 with 8 bits/pixel, as our 
original image. JPEG considers 8x8 slices of this image and produces, by digitizing 
the DCT transform coefficients with a predetermined quantization table, approximate 
representation of these 8x8 slices. We would like to explain the observation that down- 
scaling the original image, prior to applying JPEG compression to a smaller image, 
produces with the same bit usage, a better representation of original image. 

Suppose the original image is regarded as the ’continuous’ image defined over the 
unit square [0, 1] x [0, 1], as we have done in the theoretical analysis. Then, the pixel 
width of a 512 x 512 image will be 1 /512. We shall assume that the original image is 
a realization of a zero mean 2D stationary random process with autocorrelation of the 
form 
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with pi, and p 2 in the range of [0.8, 0.9], as is usually done (see [^). From a single 
image, Tq can be estimated via the expression 



r 



2 

0 



1 

512 X 512 







32,385.00, 



assuming an equalized histogram. If we consider that pi* * I = e “ 1 512 512 1 = 
e“5T2h“* I , we can obtain an estimate for a using e“ 5 T 2 = pG [0.8, 0.9]. This provides 

-^ = logeP — > a = -512 X loggp e [50,150]. 

The total number of bits for the image representation will range from 0.05bpp to 
about 2 . 0 bpp, hence, Btot will be between 512 x 512 x 0.05 = 13, 107 to 512 x 
512 X 2 = 524,288 bits for 512 x 512 original images. Therefore, in the theoretical 
evaluations we shall take cTj,, ay G [50, 150], Tq = 32500 for 256 gray level images, 
with total bit usage between 10, 000 and 1 , 000 , 000 . 

The symmetric x and y axis slicings considered will be M, iV = 1, 2, ...64, and we 
shall evaluate 



- ’■0 {1 - E (2 - 4" )(2 - k.k)M(^. 1. 0[1 - |-l} 

kl y 

with Akis provided by the optimal level allocation 

Practically, the optimal level allocation should be given by = max(l, [Afc/J), 
a measure that automatically prevents the allocation of negative numbers of bits. Ob- 
viously this step must be followed by re-normalization of the bit allocation in order to 
comply with the bits budget constraint. 1C can be taken from 1 to 3, whereas 17 will 
be {(fc, l)\k + I < 7, k,l — 0, 1, ..7}, simulating the standard JPEG approach which 
is coding of 8 x 8 transform coefficients, emphasizing the low frequency range via the 
precise encoding of only about jl7j = 36 coefficients. 

Using the above described parameter ranges, we plot the predictions of the analyt- 
ical model for the expected mean square error as a function of the slicings M with bit 
usage as a parameter. Figures El and 01 demonstrate the approximated error as a function 
of the number of slicings for various total number of bits. Figure El displays the predic- 
tions of the theoretical model in conjunction with optimal level allocation while Figure 
Eluses the JPEG style rigid relative bit allocation. In both figures the left side shows the 
results of restricting the number of bits or quantization levels to integers, while the right 
side shows the results allowing fractional bit and level allocation. 

These figures show that for every given total number of bits there is an optimal 
slicing parameter M indicating the optimal scaling factor. Note that the integer allo- 
cation cause in both cases non-smooth behaviour. Also, in EigureElit appears that the 
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minimum points are local ones and the error tends to decrease as M increases. This 
phenomenon can he explained by the fact that we used an approximation of the quan- 
tization error which fails to predict the true error for a small number of bits at large 
scales. 




Fig. 2. Theoretical prediction based on optimal level allocation MSB versus number of 
slicings M with total bits usage as a parameter. Here, we used the typical values a = 
150, and fc = 3. 




Fig. 3. Rigid relative bit allocation based prediction of MSB versus number of slicings 
M with total bits usage as a parameter. Here, we used the typical values a = 150, and 



k = l. 



Bigure0shows the theoretical prediction of PSNR versus bits per pixel curves for 
typical 512 x 512 images with different scales (different values of M, where scale= 
8M/512). One may observe that the curve intersections occur at similar locations as 
those of the experiments with real images shown in the next section. 

5 Compression Results for Natural and Synthetic Images 

In order to verify the validity of the analytic model and design a system for image 
transcoding we generate synthetic images for which the autocorrelation is similar to 
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Fig. 4. Theoretical (two left frames) and rigid relative (two right frames) bit alloca- 
tion based prediction of PSNR versus bits per pixel with image scaling as a parameter. 
Again, we used the typical values a = 150, and k = 3 for the theoretical prediction 
case, and k = 1 for the JPEG-style case. 



that of a given image. Next, we plot the PSNR/bpp JPEG graphs for all JPEG qualities, 
one graph for each given scaling ratio. The statistical model is considered valid if the 
behaviour is similar for the natural image and the synthesized one. 



5.1 Image Synthesis 

Assume that an image g{m, n) autocorrelation function is that of a homogeneous ran- 
dom field of the form 



Rgg{m,n) 



I M-1 N-1 
m'=0 n'—O 



Define the Eourier transform g{k) = T{g{x)}. Then, the power spectrum of the real 
signal is given by T {Rgg{x)} = Pgg{k) — {g{k)g*{k)). Now, considering the ID 
signal with the above given statistics, we have {g{k)g*{k)) = ■ 

Thus, we have that g{k) = . and 



s(x) s 

_ f a; > 0 

\ 0 a; < 0 



2a 1 
a'^ + k'^ ) 



In order to generate synthetic images, we ‘color’ a uniform random (white) noise 
as follows. Let be an M x N matrix in which each entry is a uniformly distributed 
random number. Next, let p and qhe M x N matrices with elements 



p{m, n) 
q{m,n) 



^ 2 ^g-axm n= f,m= 

0 otherwise, 

m = ^,n = I, N 

0 otherwise. 
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Our synthetic image is generated by the process g{m, n) = ^ 2 D{-^ 2 Z>{ffri} • ^ 2 d{p\ 
■J^2D{q)}- 



5.2 Estimating the Image Statistics and ciy 

In order to generate a synthetic image with the same statistics as that of the natural 
one, we have to hrst estimate the properties of the given image. Let us present a simple 
method for estimating the image statistics. We already used the relation (a;, y)} 

= Pgg{k,l) = (g{k,l), g*{k,l)). Explicitly, for our statistical image model we have 
that the power spectrum and the autocorrelation are given by 



2cta; 

q ;2 _|_ [2 



{g{k,l),g*{k,l)), 



^-a^\x\-ay\y\ 



P^^{{g{k,l),r{k,l))}. 



Thus, all we need to do is to estimate the slopes of the plane given by 



ax\x\ + ay\y\ = - ln{P2j^{{g{k,l),g*{k,l))}) 

= - H^2D{(^2D{g{x, y)}, {P2D{g{x, y)})*)}). 



This was the estimation implemented in our experiments. 



5.3 Experimental Results 

A JPEG compression performance comparison for a natural image and its random syn- 
thesized version is shown in Eigure0for a 256 x 256 image (hrst row) and 512 x 512 
image (second row). The hgures show the compression results of synthetic versus natu- 
ral images with similar statistics. Synthetic and original images and their corresponding 
autocorrelations are presented with their corresponding JPEG PSNR/bpp compression 
curves for 4 scales. The above experiments indicate that the crossing locations between 
scales in the synthetic images appear to be a good approximation of the crossings in the 
natural images. Thus, based on the second order statistics of the image we can predict 
the optimal scale factor. Moreover, the non-stationarity nature of images have a minor 
impact on the optimal scale factor. This is evident from the alignment of the results 
of the natural and the synthetic images. There appears to be a vertical gap (in PSNR) 
between the synthetic and the natural images. However, similar PSNR gaps appear also 
between different synthetic images. 



6 Conclusions 

We have presented an analytical model and a set of empirical results verifying our model 
and support the idea of scaling before transform coding for optimal compression. The 
numerical results prove the validity of the model, and the simple algorithms we intro- 
duced can be used in an on-line system, to (i) extract the image statistical coefficients 
{ttx and ay). Next, (ii) use the image statistics, size, and bits budget to decide on the 
optimal scaling, e.g. for the JPEG compression in a transcoding system. In another re- 
port we will explore extensions and implementation issues, like extracting the image 
statistical characteristics from the JPEG DCT coefficients in an efficient way, obtaining 
second order statistics locally and using an hierarchical slicing of the image to various 
block sizes, and more. 
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Fig. 5. Comparison between a natural and a synthesized image with similar autocorre- 
lation. 
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Abstract. This paper begins with analyzing the theoretical connections 
between levelings on lattices and scale-space erosions on reference semi- 
lattices. They both represent large classes of self-dual morphological op- 
erators that exhibit both local computation and global constraints. Such 
operators are useful in numerous image analysis and vision tasks rang- 
ing from simplification, to geometric feature detection, to segmentation. 
Previous definitions and constructions of levelings were either discrete or 
continuous using a PDE. We bridge this gap by introducing generalized 
levelings based on triphase operators that switch among three phases, 
one of which is a global constraint. The triphase operators include as 
special cases reference semilattice erosions. Algebraically, levelings are 
created as limits of iterated or multiscale triphase operators. The sub- 
class of multiscale geodesic triphase operators obeys a semigroup, which 
we exploit to find a PDE that generates geodesic levelings. Further, we 
develop PDFs that can model and generate continuous-scale semilattice 
erosions, as a special case of the leveling PDE. We discuss theoretical 
aspects of these PDEs, propose discrete algorithms for their numerical 
solution which are proved to converge as iterations of triphase operators, 
and provide insights via image experiments. 



1 Introduction 

Nonlinear scale-space approaches that are based on morphological erosions and 
dilations are useful for edge-preserving multiscale smoothing, image enhance- 
ment and simplification, geometric feature detection, shape analysis, segmen- 
tation, motion analysis, and object recognition. Openings and closings are the 
basic morphological smoothing filters. The simplest openings/closings, which are 
compositions of Minkowski erosions and dilations, preserve well vertical image 
edges but may shift and blur horizontal edges/boundaries. A much more pow- 
erful class of filters are the reconstruction openings and closings which, starting 
from a reference signal consisting of several parts and a marker (initial seed) in- 
side some of these parts, can reconstruct whole objects with exact preservation of 
their boundaries and edges j 1 31 1 b) . In this reconstruction process they simplify 

M. Kerckhove (Ed.): Scale-Space 2001, LNCS 2106, pp. 137-114^ 2001. 
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the original image by completely eliminating smaller objects inside which the 
marker cannot fit. The reference signal plays the role of a global constraint. One 
disadvantage of both the simple as well as the reconstruction openings/closings 
is that they are not self-dual and hence they treat asymmetrically the image fore- 
ground and background. A recent solution to this asymmetry problem came from 
the development of a more general powerful class of morphological filters, the 
levelings introduced in 0 and further studied in mi . which include as special 
cases the reconstruction openings and closings. The levelings possess many use- 
ful algebraic and scale-space properties, as explored in |0|, and can be generated 
by a nonlinear PDE introduced in |Ej. 

A relatively new algebraic approach to self-dual morphology is based not on 
complete lattices but on inf-semillatices 0|. By using self-dual partial orderings 
the signal space becomes an inf-semilattice on which self-dual erosion operators 
can be defined m that have many interesting properties and applications. 

In this paper we develop theoretical connections between levelings on lattices 
and erosions on semilattices, both from an algebraic and a PDE viewpoint. We 
begin in Section |2| with a brief background discussion on multiscale operators 
defined on complete lattices and inf-semilattices. In Section 0 we introduce and 
analyze algebraically multiscale triphase operators (which switch among 3 differ- 
ent states, one state being a global constraint) whose special cases are semilattice 
erosions and whose limits are levelings. The semigroup of geodesie triphase op- 
erators is discovered. Afterwards, in Section 0 we model both geodesic levelings 
and semilattice erosions using PDEs. The main ingredient here is the leveling 
PDE which we prove it can generate the multiscale geodesic operators and (as 
a special case) multiscale semilattice self-dual erosions. Section 0 extends the 
PDE ideas to 2D images signals. In both Sections 14131 we also propose discrete 
numerical algorithms for solving the PDEs, prove their convergence using the 
semilattice operators of previous sections, and provide insights via experiments. 



2 Signal Operators on Lattices and Inf-Semilattices 



A poset is any set equipped with a partial ordering <. The supremum (V) and 
infimum (/\) of any subset of a poset is its lowest upper bound and greatest lower 
bound, respectively; both are unique if they exist. A poset is called a sup-(inf-) 
semilattice if the supremum (infimum) of any finite collection of its elements 
exists. A (sup-) inf-semilattice is called complete if the (supremum) infimum of 
arbitrary collections of its elements exist. A poset is called a (complete) lattice 
if it is simultaneously a (complete) sup- and an inf-semilattice. An operator '0 
on a complete lattice is called: increasing if it preserves the partial ordering 
[f < g 4’{f) ^ V'(5)]; idempotent if = ip; antiextensive (extensive) if 
V'(/) < / (/ < V'(/))- An operator £ (5) on a complete inf- (sup-) semilattice 
is called an erosion (dilation) if it distributes over the infimum (supremum) of 
any collection of lattice elements. A negation v \s & bijective operator such that 
both V and v~^ are either decreasing or increasing and = \di, where id is the 
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identity and i/ ^ id. An operator ip is called self-dual if it commutes with a 
negation v. 

In this paper, the signal space is the collection V®' of all signals/images defined 
on E and assuming values in V, where E = or d = 1, 2, and V C M = 
K U {— oo,+oo}. The value set V is equipped with some partial ordering that 
makes it a complete lattice or inf-semilattice. This lattice structure is inherited 
by the signal space by extending the partial order of V to signals pointwise. 
Classical lattice-based mathematical morphology |2] uses as signal space the 
complete lattice £(E, V) = (V®", V, A) of signals / : E ^ V with values in V = K 
or Z. In L the signal ordering is defined by / < g f{x) < g{x),\/x, and 
the signal infimum and supremum are defined by {/\ifi){x) = sup^ fi{x) and 
\Jifi){x) — iniifi(x). Let B denote henceforth the d-dimensional unit-radius 
ball of E, assuming the Euclidean metric, and let tB — {tb : b € i?}, t > 0, be its 
scaled version. The simplest multiscale dilation/erosion on C are the Minkowski 
flat dilation/erosion of a signal / by the sets tB: 

= ifmB){x) = \f f{x-a), e%{f){x) = ifetB){x) = /\ f{x-^a) 

a£tB a£tB 

( 1 ) 

We shall also need the multiscale conditional dilation and erosion of a marker 
(‘seed’) signal / within a reference (‘mask’) signal r: 

5tB{f\r) ■■= if ®tB) Ar, £ts(/k) := (/ 0 i^) V r (2) 

Iterating the conditional dilation (erosion) by a unit-scale B yields the condi- 
tional reconstruction opening (closing) of r from /. 

Another important pair is the geodesic dilation and erosion. First we define 
them for sets ACE (binary images). Let M C E a mask set and consider 
its geodesic metric dM{x,y) equal to the length of the geodesic path connecting 
the points x,y inside M. If BM{x,t) = {p G M : dM{x,p) < t} is the geodesic 
closed ball with center x and radius f > 0, then the multiscale geodesic dilation 
and erosion of A within M are defined by 0 (A|M) := Upgx-^m(p, i) and 

e*(A|M) := [5* . By using threshold decomposition and synthesis 
of a signal / from its threshold sets Oh{f) '■= {a; G E : f{x) > h} we can 
synthesize flat geodesic operators for signals by using as generators their set 
counterparts. The resulting multiscale geodesic dilation and erosion of / within 
a mask signal r are 6 {f\r){x) := sup{/i G R : x G 6 {0hif)\0h{r))} and 
e*if\r){x) := — 5*(— /I — r). An equivalent expression is 

S\f\r)ix) = r{x) A \/ f{p), £*(/|r)(a;) = r(a;) V f\ f{p) (3) 

dM_ dMj^ 

where M_ := {x G E : f{x) < r(a::)} and M+ := {a; G E : f{x) > r{x)}. By 
letting t > oo the geodesic dilation (erosion) yields the geodesic reconstruction 
opening (closing) of / within r: 

P~if\r) ■■= V ■= 

t>0 t>0 



( 4 ) 
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In j5iSI4) a recent approach for a self-dual morphology was developed based 
on inf-semilattices. Now, the signal space is the collection of all signals / : E ^ V, 
where V = K or Z. The value set V becomes a complete inf- semilattice (cisl) if 
equipped with the following partial ordering and infimum: 



a <r b ■ 



r A b < r A a 
r\Jb>r\/a ’ 



= {r a\J at) V f\( 



for some fixed r S V. The ordering A coincides with the activity ordering in 
Boolean lattices PH2]. 

Given a reference signal r{x), a valid signal cisl ordering is given by 
f g f{x) dir{x) g{x)yx 4=^ |/(a;) - r(a;) < |g(a;) - r(a;)| Va; 

and the corresponding signal cisl infimum becomes 

/i ) (a^) = Hx) A \J Mx)] V f\ Mx) = [r(a;) V f\ /,(a;)] A \J fi{x) 



Under the above cisl infimum, the signal space becomes a cisl denoted henceforth 
by lFj.(E, V), or simply Among all possible reference cisl’s that result from 
various choices of the reference signal r(x), the cisl iFo with r(x) = 0 is of primary 
importance because it is isomorphic to any other iFr- Specifically, the bijection 
f : J-Q ^ Tri given by ^(/) = / -I- r, is a cisl isomorphism. Thus, if ■i/'o is an 
operator on then its corresponding operator on is given by 



V'r(/) = CV'oC ^(/) = r + tjjoif - r) 



(5) 



If '00 is an erosion on ipQ that is translation-invariant (TI) and self-dual, then 
tpr is also a self-dual TI erosion on J>. Note: the infimum, translation operator 
and negation operator on tFo are different from those on J>. For example, if 
r'o(/) = —f is the negation on IFq, then self-duality of ipo means ifoVQ = vo'f’o, 
whereas self-duality on means 'ipr^'r = v'r'f’r where Vr{f) = 2r — /. 

The simplest multiscale TI self-dual erosion on the cisl Tr is the operator 

i’liDix) = r(a;)-h MO A Y {f{x -a)- r{x - a))] V f\ {f{x -a)- r{x - a)) j 

\ a^tB a^tB J 

( 6 ) 



3 Lattice Levelings and Multiscale Semilattice Erosions 

Defining levelings in C as in requires a reference signal r, an input marker 
signal /, and a parallel triphase operator Ap defined by: 
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(PTl) Ap(/, r, Qfp, !3p) := (r A /3p(/)) V Ofp(/) = (r V ap(/)) A /3p(/), 
(PT2) ap,/3p are increasing and ap{f) < f < Pp{f), V/ 
where subscript ‘p’ denotes ‘parallel’. In this paper we also define a more general 
triphase operator, the serial triphase operator As, as follows: 

(STl) As(/|r,as,/3s) := a.(/|/3.(/|r)), 

(ST2) as, Ps are increasing, r < as{f\r) < f V r and r > /?s(/|r) > / A r 
where the subscript ‘s’ refers to ‘serial’ and the operators Og and Ps have two 
arguments {f,r), written as (/|r) to emphasize their different roles and provide 
a slightly different notation from the parallel case. Any parallel triphase operator 
becomes a serial one by setting as{f\r) = ap{f) V r and Ps{f\r) = Pp{f) A r. 
(However, the converse is not always true.) Thus, we henceforth drop the sub- 
scripts ‘s’ and ‘p’ from a, /3, A (the difference will be clear from the context) and 
focus more on the serial case. The triphase operators depend on four parameters; 
if some of them are known and fixed, we shall omit them. Thus we may write 
A(/|r) or simply A(/). A signal / is a called a parallel (serial) leveling of r iff it 
is a fixed point of the parallel (serial) triphase operator, i.e. if / = A(/|r). The 
original definition in m corresponds to what we call here parallel leveling. 
The definition of the serial triphase operator implies the following. 

Proposition 1 For a serial triphase operator A(/|r) = a{f\P{f\r)): 

(a) a{r\r) = P{r\r) = r. 

(b) a{f\r) = a{f V r|r) and P{f\r) = P{f A r|r). 

(c) At points where f >r, f > A(/|r) = a(/|r) > r. 

(d) At points where f <r, f < A(/|r) = P{f\r) < r. 

(e) a and P commute, i.e. a{f\P{f\r)) = P{f\a{f\r)). 

(f) r f\X = P{f\r) and r V A = o;(/|r). 

Thus, the operator a (P) affects only points where f > r (/ < r). Some general 
properties of triphase operators follow next. 

Proposition 2 (a) Both parallel and serial triphase operators are antiextensive 
in the cisl Tr', i.e., A(/|r) <r /• 

(b) Let (ai,Pi) and (a2,P2) create two (parallel or serial) triphase operators Ai 
and X2, respectively. If a\ > 02 and Pi < P2, then X2{f) dir Ai(/), V/. 

(c) If a and P are dual of each other, then X is self-dual; i.e., if a{—f\ — r) = 
-P{f\r), then A(-/| - r) = -A(/|r). 

Thus, a leveling of r from the marker / can be obtained by iterating any (par- 
allel or serial) triphase operator A to infinity, or equivalently by taking the cisl 
infimum of all iterations of A. Specifically, if tp^{f) := tp{- • ■ V’(/)) denotes 
the n-fold composition of an operator if with itself, then 

r 

A{f\r) := X°°{f\r) = X X"" (f) dr ■ ■ ■ dr X"^ (f) dr X{f) dr f (7) 
n > 1 

The map r 1— > A(-|r) is called the leveling operator and is increasing and idempo- 
tent. The signal g = A(f\r) is obviously a leveling of r from the marker / since 
A(sk) = 9- 
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If we replace the operators a and j3 with the multiscale flat erosion and 
dilation by B of m we obtain a multiscale conditional triphase operator 

r 

\B{f\r){x) ■.= [r{x) ^5tB{f){x)\y etB{f){x) = X f{x-a) (8) 

a € tB 

It is called ‘conditional’ because it can be written as a serial triphase operator, 
i.e., as a composition of conditional dilation and erosion: 

Ats(/|r) = etB{f\StB{f\r)) = StB{f\£tB{f\r)) (9) 

Comparing Q with reveals that XtB becomes a multiscale TI semilattice 
erosion on if r is constant. In particular, if r = 0, then XtB becomes a 
multiscale TI self-dual erosion on Tq. For non-constant r, XtB is generally neither 
TI nor an erosion. 

By replacing the conditional dilation and erosion in (|SI) with their geodesic 
counterparts from 0 we obtain a multiscale serial geodesic triphase operator 

X\f\r)=e\f\S\f\r))=S\f\e\f\r)) (10) 

This is the most important triphase operator because it obeys a semigroup. This 
will allow us later to And its PDF generator. 

Proposition 3 (a) As t ^ oo, A*(/|r) yields the geodesic leveling which is the 
composition of the geodesic reconstruction opening and closing: 

A{f\r) := A“(/|r) = p-(/|p+(/|r)) = p+(/|p-(/|r)) (11) 

(b) The multiscale family {A*(-|r) : t > 0} forms an additive semigroup: 

A*(-|r)A*(-|r) = A*“^®(-|r), Vt,s>0. (12) 

(c) For a zero reference (r = 0), the multiscale geodesic triphase operator becomes 
identical to its conditional counterpart and the multiscale semilattice erosion: 

r = 0^V’o(/) = A‘(/|0) = A*s(/|0) (13) 

(d) For any r, the multiscale semilattice erosion X = obeys a semigroup: 

XXr = S > 0. (14) 

The above result establishes that, for any positive integer n, the n-th iteration 
of the unit-scale geodesic triphase operator coincides with its multiscale version 
at scale t = n. The same is true for the multiscale semilattice erosions. It is not 
generally true, however, for the conditional triphase operator As(/|r), which 
does not obey a semigroup. Further, its iterations converge to the conditional 
leveling AB{f\r) = Ag (/|r) which is smaller w.r.t. than the geodesic leveling 
XfX = X°°{f\r) of (I^. Namely, r ^s(/k) A{f\r). 
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4 PDEs for ID Levelings and Semilattice Erosions 



Consider a ID reference signal r{x) and a marker signal f{x), both real-valued 
and defined on R. We start evolving the marker signal by producing the mul- 
tiscale geodesic triphase evolutions u{x,t) = A (/|r)(a;) of f{x) at scales t > 0. 
The initial value is uq{x) = u{x, 0) = f{x). In the limit we obtain the final result 
Uoo(x) = u(x,oo) which will be the leveling A{f\r). The mapping uq Uoo is a 
leveling filter. In [tilDj it was explained that, A f < r {f > r), the leveling A{f\r) 
is a reconstruction opening (closing) . 

In an effort to find a generator PDE for the function u, we shall attempt to 
analyze the following evolution rule: du{x, t)/dt = lims^o[u(a;, t-\- s) — u{x, t)]/ s. 
Since u satisfies the semigroup (1 1 2h . the evolution rule becomes 



du 1 

-r- (a;, t) = hm - 
^ sio s 



u{x — a,t) — u{x,t) 

|o| < s 



(15) 



We shall show later that, at points where the partial derivatives exist this rule 
becomes the following PDE: ut = — sign(it — r)|ux|- However, even if the initial 
signal / is differentiable, at finite scales t > 0, the above switched-erosion evolu- 
tion may create shocks (i.e., discontinuities in the derivatives). One way to deal 
with shocks is to replace the standard derivatives with morphological sup/inf 
derivatives as in p. For example, let 



M^u{x,t) := lim[ \J u{x-\-a,t) 
I L 

|a|<s 



u{x, t)]/s 



be the sup-derivative of u{x,t) along the a;-direction, if the limit exists. If the 
right Ux{x-\-^ t) and left derivative Ux{x—,t) of u along the ^-direction exist, then 
its sup-derivative also exists and is equal to 



M^u{x, t) = max[0, Ux{x-\-, t), —Ux{x—, t)] (16) 

Obviously, if the left and right derivatives exist and are equal, then the sup- 
derivative becomes equal to the magnitude \ux{x,t)\ of the standard derivative. 
The nonlinear derivative M. leads next to a more general PDE that can handle 
discontinuities in du/dx. 

Theorem 1. Q Let u{x,t) = A*(/|r)(a:) be the scale-space function of multiscale 
geodesic triphase operations with initial condition u(x, 0) = f{x). Assume that f 
is continuous and possesses left and right derivatives at all x. (a) If the partial 
sup-derivative A4^u exists at some (x,t), then 

du f Ad^(zi)(a;,t), Au{x,t)<r{x) 

— (x,f) = < -M^(-u)(x,t), if u(x,t) > r(x) (17) 

[ 0, if u(x, f) = r{x) 



^ Due to space limitations, the proofs of all theorems and propositions will be given 
in a forthcoming longer paper. 
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(b) If the partial left and right derivatives Ux{x±,t) exist at some (x,t), then 



du , 



max[0, Ux{x+, t), —Ux{x—, t)], if u{x, t) < r{x) 



— {x,t)= < mm[0,Ux{x+,t),-Ux{x-,t)], if m(x, f) > r(x) 

[ 0, if u(x, t) = r(x) 

(c) If the partial derivative dujdx exists at some (x,t), then u satisfies 



du 



{x,t) = — sign[u(a;, f) — r{x)] 



du 

dx 



{x,t) 



(18) 



(19) 



Thus, assuming that du/dx exists and is continuous, the nonlinear PDE (II 911 
can generate the multiscale evolution of the initial signal u{x,0) = f{x) under 
the action of the triphase operator. However, even if / is differentiable, as the 
scale t increases, this evolution can create shocks. In such cases, the more general 
PDE (11811 that uses morphological derivatives still holds and can propagate the 
shocks provided the equation evolves in such a way as to give solutions that are 
piecewise differentiable with left and right limits at each point. 

Consider now on the cisl iFo the multiscale TI semilattice erosions of a ID 
signal f{x) by ID disks tB — [—t,t]: 



v{x,t) =ij}l{f){x) = [0A \f f{x-a)]y /\ f{x-a) (20) 

|a|<i 



This new scale-space function v{x, t) becomes a special case of the corresponding 
function u(a;, f) for multiscale geodesic triphase operations when the reference r 
is zero. Thus, we can use the leveling PDE (II till with r{x) = 0 to generate the 
evolutions v{x,t)\ 

dv/dt = — sign(u)|9ti/i9a;| , , 

v{,x,Q) = f{x) 

If r{x) is not zero, then from the rule © that builds operators in Tr from 
operators in IFq, we can generate multiscale Tl semilattice erosions V'r(/) = 
r -(- '0 q(/ — r) of /, defined explicitly in ®, by the following PDE system 



V'r(/)(a:) = r{x) +v{x,t). 



dvjdt 
v(x, 0) 



-sign(r>)|t>a,| 

f(x)-r(x) 



( 22 ) 



To find a numerical algorithm for solving the previous PDEs, let C/" be the 
approximation of u{x,t) on a grid (iAx,nAt)). Similarly, define Ri := r{iAx) 
and Fi := f(iAx). Consider the forward and backward difference operators: 

D+.uf := - Uf)/Ax, D-^Uf := {Uf - Uf_,)/Ax (23) 

To produce a shock-capturing and entropy-satisfying numerical method for solv- 
ing the leveling PDE (Unj we approximate the more general PDE (TT^ by replac- 
ing time derivatives with forward differences and left/right spatial derivatives 
with backward/forward differences. This yields the following algorithm: 

jjn+i = - At[ {Pf)+ max(0, D~=^Uff, -D+^Uf) 

+ (Pf)--mixx(0,-D-^Uf,D+^Uf)] (24) 

sign(C/"+^ - Ri) = sign(Ei - Ri) 
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where P" = sign(J7" — Ri), = max(0,g), and q~ = min(0,g). We iterate 
the above scheme for n = 1,2,,... starting from the initial data Uf = Fi. For 
stability, {At/ Ax) < 0.5 is required. The above scheme can be expressed as 
iteration of a conditional triphase operator acting on the cisl R): 

U^+i = V a(F,), 

a{F,) = min[F„0F,_i + (1 - 0)F„0F,+i + (1 - 0)F.], (25) 

/3(F,) = max[F„0P,_i + (1 - 0)P„0P,+i + (1 - 9)F,\, 9 = At/ Ax. 

By using ideas from methods of solving PDEs corresponding to hyperbolic con- 
servation laws we can easily show that this scheme is conservative and 
monotone increasing (for At/ Ax < 1), and hence satisfies the entropy condition. 

There are also other possible approximation schemes such as the conservative 
and monotone scheme proposed in m to solve the edge-sharpening PDE ut = 
— sign(rta;a;)|w£c|. In order to solve the leveling PDE, we have modified this scheme 
to enforce the sign consistency condition sign(C/” — Ri) = sign(Pi — Ri). The 
final algorithm can be expressed via the iteration of a discrete operator as in 
but with different a and (3: 

a{Fi) = F^- 9i/ [ina,x{Fi - Pj_i, 0)]^ -|- [min(Pj+i - f,, Op, , . 

(3{F,) = Fi + 6»\/[min(Pi - F^_i, 0)P -H [max(F^+i - Fi, O)]^ 

This second approximation scheme is more diffusive and requires more compu- 
tation per iteration than the first scheme Thus, as the main numerical 

algorithm to solve the leveling PDE, we henceforth adopt the first scheme (I2t)|l . 
which is based on discretizing the morphological derivatives. Examples of run- 
ning this algorithm are shown in Fig. E An important question is whether the 
two above algorithms converge. The answer is affirmative as proved next. 

Proposition 4 //<?(•) = [i? A /?(•)] V a(-) and (a,/3) are either as in (ESJ) or 
as in the sequence C/”+^ = <I>{U^), C/° = F, converges to a unique limit 

jjoo _ (poof^p'^ which is a leveling of R from F. 

If At = Ax, then <P of 1251 becomes a discrete conditional triphase operator 
with a unit-scale window B = { — 1,0,1}, the PDE numerical algorithm coincides 
with the iterative discrete algorithm of 0 , and the limit of the algorithm is the 
conditional leveling of R from F. 

5 PDEs for 2D Levelings and Semilattice Erosions 

A straightforward extension of the leveling PDE from ID to 2D signals is to 
replace the ID dilation PDE with the PDE generating multiscale dilations by a 
disk. Then the 2D leveling PDE becomes: 



ut{x,y,f) = -sign[M(x,?/,f) - r{x,y)]\\\7u{x,y,t)\\ 
u{x,y,0) = f{x,y) 



( 27 ) 
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Fig. 1. (a) A reference signal r (dash line), a marker signal m (thin solid 
line) and its evolutions u{x,t) (thin dash line) generated by the leveling PDE 
Ut = — sign(rt — r)\ux\, at f = n25At, n = 1,2, 3, 4. (b) Multiscale semi- 
lattice erosions v{x,t) of m{x) w.r.t. zero reference, generated by the PDE 
vt = — sign(u)|ua;|, v(x,0) = m(x), at t = n25At, n = 1,2, 3, 4. (c) Multi- 
scale semilattice erosions v{x,t) + r{x) of m{x) w.r.t. reference r(x), generated 
by the PDE vt = — sign(u)|ux|, u(a;, 0) = m{x) — r{x), at t = n25At, n = 1,2. 
{Ax = 0.001, At = 0.0005.) 



Of course, we could select any other PDE modeling erosions by shapes other 
than the disk, but the disk has the advantage of creating an isotropic growth. 

For discretization, let be the approximation of u{x,y,t) on a computa- 
tional grid (iAx,jAy,nAt) and set the initial condition U°j=Fij = f{iAx,jAy). 
Then, by replacing the magnitudes of standard derivatives with morphological 
derivatives and by expressing the latter with left and right derivatives which are 
approximated with backward and forward differences, we have developed the 
following entropy-satisfying scheme for solving the 2D leveling PDE (L!YI : 

) = Ftj - Z\ty ^max^[0, D-^Fij,-D+^F^j] + max^[0, D-yFij,-D+yF^j] 
P(Pij) = Pij + ^t\/niax2[0, -D~^Fij,D+^Fij] + max^jo, -D~y F^j, D+v Fij] 

(28) 

For stability, {AtjAx + At/ Ay) < 0.5 is required. This scheme is theoretically 
guaranteed to converge to a leveling. Examples of running the above 2D algorithm 
are shown in Fig. 0 

Why use PDFs for levelings and semilattice erosions? In addition to the well- 
known advantages of the PDE approach (such as more insightful mathematical 
modeling, more connections with physics, better approximation of Euclidean 
geometry, and subpixel accuracy), there are also some advantages over the dis- 
crete modeling that are specific for the operators examined in this paper. For 
levelings the desired result is mainly the final limit. The PDE numerical algo- 
rithms converge to a leveling Anum- The discrete (algebraic) algorithm of |H| 
converges to the conditional leveling Aeon- If A is the sampled true (geodesic) 
leveling, then r Ar Aeon Ar Anum Ar A. Hence, the discrete algorithm result 
has a larger absolute deviation from the true solution than the PDE algorithm. 
Further, the discrete algorithm uses At = Ax and hence it is unstable (ampli- 
fies small errors). In the 2D case we have an additional comparison issue: In 
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Reference Marker {t = 0) 




Lev. Evolution {t = lOAt) Lev. Evolution {t — 20Zit) Leveling {t = oo) 




Semilatt. Erosion(t = 3At) Semilatt. Erosion {t — 6At) Semilatt. Erosion {t = oo) 




Fig. 2. Multiscale semilattice erosions and levelings of soilsection images gener- 
ated by PDEs. (a) Reference image r{x,y). (b) Marker image m{x,y) obtained 
from a 2D convolution of r with a 2D Gaussian of <t = 4. Images (c),(d),(e) show 
evolutions u{x,y,t) generated by the leveling PDE ut = — sign(u — r)||VM||. 
Images (f),(g),(h) show multiscale semilattice erosions v{x,y,t) + r(x,y) gen- 
erated by the PDE vt = — sign(u)||Vt;|| with v(x,y,0) = m(x,y) — r(x,y). 
(Ax = Ay = 1, At = 0.25.) 



some applications we may need to stop the marker growth before convergence. 
In such cases, the isotropy of the partially grown marker offered by the PDE is 
an advantage. 

For multiscale semilattice operators the final limit is not interesting since 
it coincides with the reference; i.e., V’“(/) = ^/- What is more interesting 

in this case are the intermediate results. In this case producing 2D semilattice 
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erosions via the following PDE system yields isotropic results 



y) = r{x, y) + v{x, y, t), 



dv/dt 

v{x,y,0) 



-sign(u)||Vr;|| 
f{x,y) - r{x,y) 



(29) 
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Abstract. We have been witnessing lately a convergence among math- 
ematical morphology and other nonlinear fields, such as curve evolu- 
tion, PDE-based geometrical image processing, and scale-spaces. An ob- 
vious benefit of such a convergence is a cross-fertilization of concepts 
and techniques among these fields. The concept of adjunction however, 
so fundamental in mathematical morphology, is not yet shared by other 
disciplines. The aim of this paper is to show that other areas in image 
processing can possibly benefit from the use of adjunctions. In partic- 
ular, it will be explained that adjunctions based on a curve evolution 
scheme can provide idempotent shape filters. This idea is illustrated in 
this paper by means of a simple affine-invariant polygonal flow. 



1 Introduction 

One of the most fundamental concepts in mathematical morphology is that of 
adjunction. An adjunction consists of an erosion/dilation pair, linked to each 
other in a unique way by a duality property. To define adjunctions, one merely 
needs to assume that the set of signals S (e.g., images, shapes, etc.) carries some 
partial ordering <. 

Adjunctions have several simple but interesting algebraic properties. In par- 
ticular, the concatenations of the dilation and erosion that form an adjunction 
give rise to idempotent operators known as opening and closing. As a result, ad- 
junctions have become the most important theoretical concept in mathematical 
morphology, and despite (or perhaps, thanks to) their mathematical simplicity, 
their “adoption” has turned the research area of morphology into one with a 
strong mathematical foundation. 

Despite their success in morphology, adjunctions have never been considered 
in the context of other image processing disciplines. Likely, this is due to the fact 
that adjunctions are often associated with dilations and erosions in their classical 
meaning. The work presented in is a first attempt to alter this situation. In 
that paper, the first author suggested that an extended concept of adjunction 
can serve as a basis for signal processing in general, including the linear case! In 
fact, it is argued there that partially ordered sets and adjunctions can serve as 

M. Kerckhove (Ed.): Scale-Space 2001, LNCS 2106, pp. 149-^^^ 2001. 

© Springer- Verlag and lEEE/CS 2001 
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a paradigm for signal processing tools in general. This paradigm consists of the 
following key ingredients: 

— the information carried by various signals can be compared by using a prop- 
erly chosen partial ordering; 

— signal simplification and signal reconstruction, respectively, can be modeled 
by means of adjunctions; 

— (idempotent) image filtering is achieved by an opening; the anti-extensivity 
of an opening guarantees that the filter reduces the information content. 

The objective of this paper is to convey some of the ideas presented in 
but focusing specifically on the area of scale-spaces in general, and pyramids and 
curve evolution in particular. Familiarity with is not required for reading and 
understanding this paper. 

After a brief review of the theoretical background, the first part of this paper 
shows the strong link between adjunctions and signal pyramids. The recent work 
in |3| is considered here as the basic framework for pyramids in general, and its 
axiomatic assumptions are proven to be closely related to those of an adjunction- 
based setting. Then, application of adjunctions in the context of curve evolution 
is considered. A simple case is studied, which illustrates the creation of an idem- 
potent filter for shape denoising, derived from a curve flow. 

2 Background on Adjunctions 

The concept of adjunction was introduced by Heijmans and Ronse in ^ in the 
context of classical mathematical morphology. In that context, the set of sig- 
nals is assumed to have a complete lattice structure (see [4l5j for definition of 
complete lattices and its role as a framework for classical mathematical mor- 
phology). However, the very same concept of adjunctions allows the extension 
of the theoretical morphology framework beyond complete lattices, first to com- 
plete semilattices 0, and then to generic partially ordered sets (posets) |1I2[ . 
The latter framework is the one adopted here. 

The above means that all that is required from a set of signals in order to 
enable the definition of adjunctions is a partial ordering. 

Definition 1. (Partial Ordering) A relation < in a set S is a partial ordering 
if it is reflexive (s < s), anti-symmetric (if s < r and r < s, then s = r), and 
transitive (if s < r and r < q, then s < q). 

A set ordered by a partial ordering is called a partially ordered set or poset. 
Adjunctions on posets are then defined as follows. 

Definition 2. (Adjunctions) Let <s and <n he two partial orderings defined 
on sets S and TZ respectively. Let e \ S ^ TZ and 6 \ TZ ^ S he two operators. 
The pair (e,S) is called an adjunction between the posets (S,<s) o.'n-d {TZ,<ti) 
if f s(s) ^ S(r) <s s, Vs e S,r G TZ. 
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In classical morphology, adjunctions are known to be related to the concepts 
of erosion and dilation 0. Here we shall, for the sake of simplicity, actually 
define erosions and dilations in terms of adjunctions. 

Definition 3. (Erosions and Dilations) An operator e : S TZ is called an 
erosion between (S, <s) and (TZ, <tz) iff there exists an operator 6, such that 
(e,S) forms an adjunction between (S,<s) and {TZ,<ti). Similarly, 6 will be 
called a dilation iff there exists an operator e, such that (e, i 5 ) is adjunction. 



Definition 4. (Morphological Opening and Closing) Let (e, i5) be an ad- 
junction. The operator a = Se is called the opening associated with the adjunc- 
tion (s,S). Similarly, the operator P = sS is called the closing associated with 
the adjunction {e,5). 

We summarize some algebraic properties. 

Proposition!. (Uniqueness) If (e, 5i) and (£,^ 2 ) are adjunctions between 
(S,<s) O'T^d {TZ,<n), then (5i = 62 . Similarly, if (ei,S) and {e 2 , 5 ) are adjunc- 
tions, then £i = £ 2 . 

In other words, the dilation which forms an adjunction with a given erosion is 
unique, and we shall speak of the adjoint dilation. The adjoint erosion is defined 
in an analogous manner. 

Proposition 2. If{e,5) is an adjunction between {S,<s) and (JZ,<n), then 

1. eSe = £. 

2. SeS = 6 . 

3. The erosion e and the dilation 6 are both increasing 

4 . S{r) = inf{s G 5 | r <7^ e(s)}, Vr G TZ. 

5. £(s) = supjr G TZ I 6{r) <5 s}, Vs G S. 

The identity in 4 also involves the fact that the infimum (greatest lower 
bound) of {s G 5 I r <n £(s)} exists for all r G TZ. A similar remark applies to 
the identity in 5 . 

Proposition 3. Let a and /3 be the morphological opening and closing, respec- 
tively, associated with an adjunction between {S, <s) and {TZ, <n)- The following 
properties hold: 

1. Both operators are idempotent (i.e., aa = a and /3/3 = P). 

2. Both operators are increasing. 

3. Both operators are bounded by the input; a{s) <5 s and r <7^ P{r), for all 
s G S and r G TZ. We say that a is anti-extensive and that P is extensive. 



^ An operator t/j : 5 ^ 7?. is called increasing if si <5 S 2 implies that ip{si) <tz '4>{s2). 
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3 Adjunctions and Pyramids 

3.1 Background: Algebraic Axiomatics 

In Pj, a general framework for pyramidal signal decomposition is presented. 
It unifies linear and nonlinear multiresolution decompositions, by defining an 
axiomatic characterization of the analysis and synthesis operators that generate 
a pyramid. We briefly review this approach here. 

A signal s G is decomposed into a collection of coarse signals sj G Sj, 
j = 1,2, . . ., by means of a family of analysis operators ipj : Sj iSj+i, such 
that Sj_|_i = tjjj{sj). In linear pyramids, such as the Burt-Adelson pyramid, the 
analysis operator is usually a linear decimation process: Filtering (convolution) 
followed by downsampling. 

Signal synthesis, or approximation, is obtained by a family of synthesis oper- 
ators V'l : 'Sj+i I— > Sj, which generate an approximation signal Sj to the signal Sj, 
from the coarser one Sj+i, according to Sj = i/jjisj+i)- In linear pyramids, the 
synthesis operator is usually a linear interpolation process: Upsampling followed 
by Altering. 

In order to fulfill a series of intuitive conditions imposed on the analysis and 
synthesis operators, it was shown in j2j that a single axiomatic condition should 
hold for all j: 

= id on 5j+i, (1) 

where id denotes the identity operator. Equation m is called the pyramid con- 
dition. This condition implies that 

= V'] and = V'j (2) 

V'j'f/’j is idempotent. (3) 

3.2 Relationship between the Pyramid Condition and Adjunctions 

After a brief comparison, one may notice that pairs of analysis/synthesis oper- 
ators and adjunctions have some common properties. For instance, compare m 
with items 1 and 2 in Proposition El Also, compare (0 with item 1 in Propo- 
sition 0 This lead us to wonder if there is a deeper relation between these 
structures. 

In this section, we prove that any family of analysis and synthesis operators 
satisfying the pyramid condition forms a family of adjunctions between posets, 
with respect to an appropriate family of partial orderings. Even though it is not 
true that every adjunction satisfies the pyramid condition, the latter is satisfied 
if the erosion is surjective. 



Prom Pyramids to Adjunctions. 

Assume that the pyramid condition of Subsect. 15. 1 1 holds. We shall next endow 
every space Sj with a partial ordering such that the analysis and synthesis 
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T I 

operators become erosions and dilations, respectively. The operators i/'j and 
induce, on each set Sj, the following partial ordering <j : 



Vs', s € Sj, s' <j s ^ 



s' = s, or 

3J>j\ s' = tp[j^j]{s), 



(4) 



where = V'fj- 

j] = and V'fj, j] = • • ■ V'j-i- 

The above series of partial orderings are illustrated in Figure D 




Fig. 1. Series of partial orderings generated by families of analysis/synthesis 
operators. Each partial ordering <j is depicted schematically on a horizontal 
line, which increases from left to right. For each <j, a signal is smaller than 
another if it can be obtained from the latter by a same number of upstream and 
downstream analysis/synthesis operations. 



Proposition 4. For any j and any J > j, (V'y jj j V'|j jj ) adjunction be- 

tween (Sj,<j) and (Sj,<j). 

Proposition 5. For any j and any J > j , is the morphological opening 

associated with J]’ J])’ 

Corollary 1. For any j, adjunction between (Sj,<j) and 

(iSj+i, and il’j'fpj the associated morphological opening. 

We should note that the family of partial orderings defined in 0) is not 
necessarily the only one for which Propositions 01 0 and Corollary 0 hold. 
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Prom Adjunctions to Pyramids. 

Suppose that we are given a family of posets (Sj,<j), and adjunctions (sj,Sj) 
between Sj and Sj+i. Suppose that Sj is surjective, that is, any element s' in 
Sj+i is of the form £j(s) for some s £ Sj. We then get for all s' G 5j+i: 

ejSj(s') = EjSjSjis) = Sj{s) = s'. (5) 



In other words: 



EjSj = id on 5j+i, (6) 

which is the pyramid condition. Therefore, if ej is surjective, then the adjunction 
(sjjSj) consists of an analysis/synthesis pair. 



4 Adjunctions for Curve Evolution 

In this section, we consider the application of adjunctions and the pyramid con- 
dition in the area of curve evolution. 



4.1 Motivation 

The strength of curve evolution for shape filtering is well known and documented 
in the literature (see [iSlD] for example). Important properties, like affine invari- 
ance, can be obtained by curve evolution methods, and they have been embedded 
and studied within a solid theoretical framework. 

However, one disadvantage of classical curve evolution methods for denoising 
is the fact that the curve shrinks as time passes, and therefore the evolved curve 
gets farther and farther from the original one. For illustration, a discrete example 
can be seen in Fig. |21 There, a noisy C-shaped curve is submitted to several 
evolution iterations (a detailed description of this flow is given in the sequel), 
and the result is a nearly-elliptic, noiseless, shrinked version of the original curve. 
Although the noise is removed, the final shape does not resemble the original 
one; it is too distorted to consider this as a successful denoising operation. 

In order to reduce this problem, one might (i) keep the number of iterations 
low, and (ii) rescale the final curve in order to have their contours match as 
much as possible, e.g., expand the final curve so it has the same area as the 
original one. Although this indeed yields an improvement, it does not solve the 
problem entirely: a small number of iterations may not remove enough noise 
energy, whereas rescaling keeps the final shape, possibly distorted. 

An alternative approach to solve the problem is to apply the inverse flow to 
the outcome of the forward flow. This would bring the contour of the final curve 
somehow “close” to the original one. Others have disregarded this approach for 
two main reasons: First, in many important cases, the forward flow is mathe- 
matically invertible, which would make the forward/inverse composition equal to 
identity, and therefore useless. In practice though, and that is the second reason, 
the inverse flow is unstable, and the curve usually “explodes” after a few inverse 
iterations. 
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Fig. 2. Affine-invariant polygon evolution. The original curve is shown in thin, 
solid line, the curve after 30 iterations is shown bold-faced, and the dotted curves 
depict the evolution. 



Some accepted ways to really avoid the above problem are (i) to use a switch 
to toggle between different flows or (ii) to use a constrained optimization 
procedure HH. The stable pseudo-inverse operators discussed in m for shape 
enhancement and exaggeration might also give fine results. Below we propose an 
alternative approach, using adjunction pyramids, which leads to an idempotent 
shape Alter. 



4.2 Proposed Approach — General Idea 

The reason why the exact inverse flow is not achievable in finite-state machines 
is because the forward flow calculation is actually a composition of the Altering 
operation with quantization. This quantization is a result of arithmetic rounding 
in all steps of the computation, which slightly distorts the exact Anal results, 
but, more importantly, causes the whole operation to become non-invertible. 

In order to obtain a stable inverse flow, our proposed approach is to design 
an adjunction pyramid {et,St), where the erosion et describes the forward flow 
as well as the quantization from time t to t -I- 1, and the adjoint dilation St is 
the (pseudo-) inverse flow step. Starting with a curve Cq at time 0 we obtain 
the curve ft(Co) = £t-i£t -2 ■ ■ ■ £o(Co) at time t. From the theory on adjunctions 
we know that £t is an erosion between So and St with adjoint dilation given 
by At = (5o^i • ■ ■ St-i mapping St into So- Furthermore, we get that (£t, At) is 
an adjunction. Therefore, composition of the forward flow with the “inverse” 
flow, i.e., at = AtSt, which is the associated morphological opening (of “size” t, 
in morphology jargon), will be stable, idempotent, and could be regarded as an 
ideal Alter. Moreover, according to item 1 in Proposition^ StAtSt is equal to £t, 
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which means that the “inverse” flow At preserves the information retained by £t- 
Also, at is increasing and anti-extensive with respect to some partial ordering, 
and the latter characterizes the nature of the information removal performed by 
the forward flow £t- 

The question is how to calculate the “inverse” flow step 5t, given the forward 
one £(. In the next subsection we propose a guideline for adjunction design, using 
a given measure. A simple case where it is possible to calculate a closed-form 
solution is studied in Section ^31 In general however this is not a straightforward 
task, and iterative solutions may be required. 

4.3 Adjunction Design Using a Measure 

Suppose there exists a measure (or functional) ^t defined on St, which char- 
acterizes in a satisfactory way the “simplicity” or “smoothness” of a shape or 
curve. For instance, fit could be the perimeter or the area of the shape. Define 
St '■ St+i —>■ St by means of the minimization-problem formulation 



Note that this definition presupposes that the minimization problem has a unique 
solution for every s G St+i- In that case we have StSt(s) = s, hence (st,St) 
satisfies the pyramid condition, and is therefore an adjunction. 

The proposed approach is therefore to use o to calculate the “inverse” flow, 
assuming that the required conditions of existence and uniqueness are satisfied. 
If these conditions are not satisfied, then other slightly different approaches could 
be considered, but these fall outside the scope of this paper. 

The underlying partial ordering induced by the resulting adjunction is in- 
timately related to the choice of measure fit- Specifically, let (et,St) be an ad- 
junction obtained by means of ( 0 , and associated to partial orderings <t in St- 
Let < be the usual ordering on the real numbers. Then, for all si,S 2 G St, the 
partial ordering <t satisfies: 



In many cases, of which the example below is a particular example, the various 
steps St whose composition yields the forward flow £t, are time-independent, 
i.e., Et = e. In that case we have £t = e*, the t’th power of e. If we choose the 
measures fi to be independent of t, we also get St = S, hence At = S*. 

4.4 Case Study: Polygon AfRne Evolution 

In [ 7 ] Bruckstein, Sapiro, and Shaked presented a simple scheme for affine- 
invariant evolution of polygons. It can be regarded as a discrete, non-geometric 
version of affine evolution of continuous curves. 

Given an A- vertex polygon Vo with vertex points Xo{i) G R^, i = 0, . . . ,N — 
1, perform an affine evolution using the rule: 



St{s) = arg min {fit{r) | r e and £t(r) = s} . 



( 7 ) 



Si <t S2 ^ Mt(si) < fit(s2)- 



( 8 ) 




( 9 ) 
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where Xt{i) are the vertices of the polygon after the t-th iteration and i ± 1 
are taken modulo N. It was shown in jjj that any polygon submitted to this 
flow converges to a vanishing polygonal ellipse. The discrete flow previously 
presented in Figure |21 is an example of polygon affine evolution. Equation m 
consists of a linear cyclic convolution of the polygon coordinates. If we write 
Xt(i) = {xt{i),yt{i)) for every i, then the evolution amounts to Altering the 
functions Xt{i) and yt{i) separately by the Alter (i, ^). It can be shown that, 

if N is not a multiple of 3, then the above Altering operation is invertible. That 
means that, in principle, the polygon flow can be inverted. However, in practice, 
the inverse flow is unstable. 

Because of the cyclic nature of the above convolution, this operation can be 
performed very easily in the Fourier domain. Henceforth we denote the vertex 
points of the N-vertex polygon V by x-p{i) = (x-p{i),y'p{i)). Denote by X-p{k) 
the (coordinate-wise) Fourier transform of x-p^i). Th evolution process described 
by (0 can be reformulated by applying the Fourier transform at both sides. Then 
we arrive at: 

Xp,^,{k) = F{k)XrM, ( 10 ) 

where F{k) is the frequency response of the Alter (1,1,1), which is a real- valued 
function. Rather than using this cartesian representation, we will represent the 
Fourier transform of V by its polar form [H-p(fc), 9p{k)], where Ap is the ampli- 
tude and dp the phase. When reformulated in terms of polar coordinates (cni) 
looks as follows 



Ap,^,{k) = \F{k)\Ap,{k) (11) 

Ovt+i (k) = (^Vt (^) + phase(E(fc)) mod 2tt . (12) 



Since F is real- valued, its phase can only assume the values 0 and tt. The ex- 
pressions in (HU-dEl do not yet take quantization effects into account. Below 
we will give formal expressions for the erosion and dilation which constitute the 
pyramid associated with the affine evolution in 0 and which do also include 
quantization effects. Note however that, thanks to the fact that F is real- valued, 
we only need to deal with quantization of the amplitude term in it I ill . Note also 
that the spaces St which define the consecutive levels of the pyramid, do not 
depend on t, i.e., St = S for all t > 0, and the same can be said for the erosions 
and dilations, i.e., £t = s and 5t = 5. The latter means in particular that the 
evolution of a curve from time 0 to t is governed by e*, the t’th iterate of e. 

We define q{k) to be the quantization step for the amplitude at frequency k. 
Let S be the set of IV- vertex polygons V for which the amplitude function Ap{-) 
is quantized, i.e., Ap{-) / q{-) is integer- valued. Define the erosion e : 5 — > 5 in 
terms of the polar representation {Ap, Op) of the Fourier transform of V'. 



Ae(v){k) = 



\m\Ay{k) 

q{k) 



q{k) 



0s(P){k) = 9p{k) -I- phase(E(fc)) mod27r. 



(13) 

(14) 
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where [-J denotes the floor function. Furthermore, we deflne the adjoint dilation 
5 using a minimization procedure like in m using for /r the energy 

A^-l \ ^ 

. (15) 

fc =0 / 

This yields, in terms oi X-p, the expression 

Xs(v){k) = F~\k)-Xp{k), (16) 

where F~^{k) is the pseudo-inverse of F{k), i.e., 

F-\k) = 

In Fig. 0 one can observe the result of applying the forward and inverse flows to 
“noisy” polygons (thin, solid line). The dotted curves are the result of 30 iter- 
ations of the forward affine flow. Our simulations were performed using Matlab 
with a uniform quantization step q{k) equal to 10“^°. The bold-faced curves are 
the result of applying 30 iterations of the “inverse” flow on the dotted curves. 
In morphological terms, the bold-faced curves in Fig. 0are the opening of the 



r 1/F{k), F{k) ^ 0, 
\0, F{k) = 0. 





Fig. 3. Forward and inverse afline-invariant polygon flow. The original curves 
are presented in thin, solid lines. After 30 iterations of forward flow (erosion), 
the dotted curves are obtained. After further 30 iterations of inverse flow (adjoint 
dilation), one obtains the bold-face curves. 



original curves with “size” 30. As stressed before, the opening can be regarded as 
an ideal Alter; like any opening, it is idempotent, increasing and anti-extensive, 
where increasing and anti-extensive are with respect to the underlying partial 
ordering induced by {St, At). In this case, anti-extensivity means producing a 
“smoother” curve, where the characteristics of this smoothing are dictated by 
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the chosen energy measure minimization and the nature of the erosion operation 
(i.e., the affine flow). In the continuous case, the affine flow reduces the affine 
arc length of the curve; a similar quantity might be the reducing criterion for 
the polygon flow, but this is yet to be investigated. 

Notice that, since the dotted curves are obtained again after 

further applying 30 iterations of the forward flow to the bold-faced curves. 

The amount of smoothness produced by the opening operation is directly 
related to the “size” parameter, i.e., by the number of iterations of forward and 
inverse flows; see Fig. E] For “size” 0, the opening is the identity operator. As 
“size” increases, the filtered curve becomes smoother, and later on (not shown) 
tends first to an ellipse, then to a circle, and, for an infinite “size”, to a point. 




Fig. 4. Series of openings with increasing “size” . The original curve is in thin, 
solid line, the bold-faced curve is the opening with parameter 30 (corresponding 
to 30 iterations of forward flow plus 30 iterations of inverse flow). The dotted 
curves show the result of openings with various “sizes” . 



Observe that in our approach one can distinguish two different scale-spaces. 
One is generated by the erosion (the forward flow), and evolves as the time 
parameter t increases; this is the scale-space illustrated in Fig. 0 The second 
scale-space is generated by the opening (the ideal filter), as its “size” param- 
eter (which is identical to the time parameter t) increases. This scale-space is 
illustrated in Fig.0. Notice that there exists a one-one correspondence between 
both scale-spaces: every level of the opening scale-space can be obtained from 
the corresponding one in the erosion scale-space by applying the “inverse” flow. 
Similarly, every level of the erosion scale-space can be obtained from the opening 
scale-space by using the forward flow again. This means that these two scale- 
spaces carry exactly the same information, differing only by their representations. 

One of the main conclusions of this study is that, in many applications, such 
as denoising, one may be more interested in the opening scale-space, rather than 
the erosion one. 
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5 Conclusion 

In this paper, we showed how the concept of adjunctions, so central in math- 
ematical morphology, can be adopted by other disciplines as well. An intimate 
relationship between adjunctions and pyramids was proven, and the creation of 
idempotent shape filters based on curve evolution schemes was suggested and 
illustrated. 

The close, intuitive relationship existing in general between pyramids and 
scale-spaces hints that adjunctions could also play a central role in scale-space 
theory in general. The study case described here also suggests such link, and 
the concepts of erosion scale-spaces vs. opening scale-spaces are devised. Hints 
regarding these entities can be also found in where summation and supremal 
scale-spaces are defined and studied. Thus, an investigation of the relationship 
between adjunctions and scale-spaces is in order, and it is under way. 
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Abstract. Segmenting an image amounts to producing a partition, in which each 
tile represents an object of the image. Given an image, how to segment it into 
a predetermined number of regions ? How to select the objects to represent or 
discard when the number of regions varies ? Producing a series of nested parti- 
tions, or hierarchy is an answer to this question but is also central to practically all 
morphological segmentation approaches. In the present paper, we define, study, 
construct and show how to use for various segmention or filtering tasks such hi- 
erarchies. 



1 Introduction 

Segmenting an image amounts to producing a partition, in which each tile represents 
an object of the image. Segmentation is an extremely difficult task, as it requires some 
degree of semantic understanding of the images. The classical tool provided by math- 
ematical morphology for segmenting images is the watershed, which sets the limits of 
the catchment basins of a topographic surface. The semantic analysis of the image is 
performed through the selection of a set of markers. By marker, we mean a binary set 
included in the object of interest ; it’s exact location or shape has no importance. The 
strategies for finding markers are diverse and problem dependent ; many case studies 
are listed in QQ. This classical segmentation process may be seen as a two stage pro- 
cess. In a first stage the watershed of the gradient image is constructed ; each frontier 
between two adjacent catchment basins is weighted by the altitude of its lowest pixel, 
which is a pass point between them. Suppressing all frontiers below some threshold A 
produces a coarser partition. For increasing values of A, one obtains a series of nested 
partitions, also called a hierarchy. The second stage of segmentation will then select the 
boundaries with the highest weight separating the markers. The quality of segmentation 
will depend upon the correct choice of markers and on the quality of the boundaries 
present in the hierarchy. Many efforts have focused on the selection of good markers 
and much less attention has been given to the construction of alternative hierarchies. It 
is the scope of the present paper to focus on the manifold of means to derive meaning- 
ful hierarchies from an image, in order to enrich the palette of available segmentation 
tools. We first define the hierarchies, study their algebraic strucutures and show how 
to construct new hierarchies by combination of preexisting ones. In a second section 
we show how the watershed line associated to increasing floodings of a topographic 
surface is able to construct hierarchies. We then present a series of particular modes of 
flooding and characterise the associated hierarchies. A third section shows new means 
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to use hierarchies in segmentation. A last section considers another type of hierarchies 
associated to flooding, with the aim to simplify and filter images, without blurring or 
displacing contours. 

2 The Lattice of Hierarchies 

We are interested in segmenting images, that is functions of Fun(i?,T) where E repre- 
sents the support of the images (a continuous domain or a discrete grid, in any number 
of dimensions) with value in T (in practice the set of reals or integers). The power set 
V{E) of E contains all subsets of E. The result of any segmentation of an image / of 
Fun(i?,T) will be a partition 6 of E, that is a family {Xi) of elements ofV{E) verify- 
ing: Xi AXj = oo for i j and IJ Xj = E. Often we are interested in representing an 
image not only as one partition but as a series of partitions with an increasing number of 
regions. Of particular interest is the case where these partitions are nested: every con- 
tour of a partition also belongs to all finer partitions. In such a case, coarser partitions 
are obtained by merging adjacent regions of finer partitions. Such nested partitions are 
called hierarchies. We now give an axiomatic definition of hierarchies [H and study 
their properties. 

2.1 Definition of a Hierarchy 

2.1.1 Definition of a Tree and Its Elements. Let Ahe a subset of V{E), on which 
we consider the inclusion order relation. .4 is a dendrogram, if the following axiom is 
verified: 

Axiom 1. (Dendrogram Axiom) A,U,V G A : 

AcU and AcV^UcV or VcU 

If is a dendrogram, we may define: 

- the summits: Sum(A) = {A G A \ f/B gA:AcB^A = B} 

- the leaves: Leav(^) = {A G A \ VS gA'.BcA^A = B} 

- the nodes: Nod(,4) = A — Leav(yl) 

- the predecessors: Pred(A) = {B G Vl | S C A\ 

- the successors: Succ(A) = {B £ A \ A C B} 

2.1.2 Definition of a Hierarchy. Vl is a hierarchy, if the two following axioms are 
verified: 

Axiom 2. (Intersection Axiom): two elements of A which are not comparable for the 
inclusion order have an empty intersection: A, B G A: A H B G {A, B, 0} 

Axiom 3. (Union Axiom ) Any element A of A is the union of all other elements of A 
contained in A: 

yA&A:{j{B&A\BcA-Bf=A} = {A,%} 

Proposition 4. The intersection axiom implies that A is a dendrogram for the inclusion 
order. 

Proof. \f A CU and A CV, then t/n P 0, implying that U = U or U nV = V, 

that is y cUorU C V showing that the dendrogram axiom is satisfied. 
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2.2 Stratified Hierarchies, Ultrametric Distances and Nested Partitions 

.4 is a stratified hierarchy, if it is equipped with an index function st from A into 7Z 
which is strictly increasing with the inclusion order: B G A 

Ad B and B ^ st(a) < st(6) 

2.2.1 Ultrametric Distances on a Stratified Hierarchy. Given a stratified hierarchy 
A, for which the smallest stratification index st is equal to zero, a distance between the 
elements ofV{E) is defined hy d{C, D) , the index of the finest partition in which a tile 
contains both sets C and D : VC, D G V{E) 
d{C, D) = inf {st(^) \ A & A \ C d A and D d A} 

Properties : d is an ultrametric distance index: 

\/A,B & A d{A, B) = 0^ A = B 
VC, D G V{E) d{C, D) = d{D, C) 

VS, C, S G V{E) d{C, D) < max {d{C, B), d{B, D)} 

This last inequality is called ultrametric inequality, it is stronger than the triangular 
inequality. It expresses that the index of the smallest tile containing C and D is smaller 
or equal than the index or the smallest tile containing all three elements B, C and D. 

For X G V{E) the closed ball of centre X and radius p is defined by Ball(X, p) = 
{D G V{E) I d{X, D) < p} . Each element of Ball(X, p) is a centre of the ball. Fur- 
thermore the radius of a ball is equal to its diameter. 

Two closed balls Ball(X, p) and Ball(y, p) with the same radius are either disjoint 
or identical: the balls of radius p form a partition. For increasing values of p we obtain 
nested partitions. 



2.2.2 Stratification Associated to Nested Partitions. Inversely the union of all tiles 
belonging to a series of nested partitions ( 6 i ) constitutes a hierarchy A. Such a series of 
nested partitions (0^) may easily be generated from an initial fine partition © o = Ui?i, 
i = 1, . . . , n on which a dissimilarity index 5 is defined between neighboring tiles: 
if we merge all tiles of ©o with a dissimilarity index below a given threshold A, we 
obtain a coarser partition with a stratification index equal to A. For increasing values 
of A we obtain a series of nested partitions, forming a hierarchy A. Of course, many 
different dissimilarity measures may be considered to weight the boundaries between 
adjacent tiles. If the tessellation is the result of the watershed construction on a gradient 
image, this dissimilarity measure can be defined as the lowest (or average) grey level 
value of the gradient image along the border separating the two regions. Other possible 
measures are color distances, various measures of local contrast, or even motion or 
texture dissimilarity. 

Such types of hierarchies can most economically be described as a weighted tree, 
defined as follows. The region adjacency graph (RAG) associated to ©o is a non- 
directed graph G = {X, U), where X is the set of nodes and U is the set of edges. Each 
element Xi G X represents a region Ri G ©o- Two elements Xj and Xk are linked by an 
edge Ujk if and only if the corresponding regions Rj and Rk are neighbors ; the edge is 
assigned a weight wjk = 5{xi, xj) measuring the dissimilarity between both regions. 
Let us suppose furthermore that all weights are different (this is not really a restriction, 
as in cases of equivalences it is always possible to introduce micro differences between 
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them.). A path on a graph is classically defined as a series of nodes {xi,X 2 , ■■■,Xn) such 
that Xi and Xi+i are linked by an edge. We define the altitude of a path as the highest 
weight of the edges along this path ; this weight is called sup-section of the path. It is 
a well known result 0| that the union of all paths of smallest sup-section between two 
arbitrary nodes of the RAG is the minimum spanning tree (MST) T of the RAG: 

* it spans the RAG: all nodes belong to it. 

* it is a tree: it has no cycles. Between any two nodes there is a unique path on the tree. 

* the total sum of weights of T is minimal among all possible spanning trees. 




Fig. 1. 4 Equivalent Representations of Nested Partitions: 

- as a fine partition with a dissimilarity index between neighboring regions 

- as a topographic surface, where the height of the dam between regions represents their 
dissimilarity 

- as a region adjacency graph 

- as the minimum spanning tree of the RA. 



There exists a unique path between the nodes Xi and Xj on the tree T. Let A be the 
highest weight on this path. Merging all tiles of 6 o with a dissimilarity index below 
A produces a coarser partition where Xi and Xj belong to different tiles ; merging also 
the tiles with dissimilarity equal to A produces a still coarser partition for which now a;, 
and Xj belong to the same tile. The smallest stratification index for which two regions 
Ri and Rj belong to a same tile of the hierarchy obviously constitutes an ultrametric 
distance between the corresponding nodes Xi and xj. We call this ultrametric distance 
0 {xi, Xj) = A or level distance between Xi and Xj, as it is the highest edge on the 
unique path of T between Xi and Xj. It is the greatest ultrametric distance which is 
below the dissimilarity 6 and is called the subdominant ultrametric distance associated 
to S. 

Inversely, let us consider a spanning tree 0. To any distribution of weights W = 
(wjk) on the edges of 0 is associated an ultrametric distance dw{xi,Xj), equal to the 
weight of the highest edge on the unique path between x i and Xj . This property will 
be exploited intensively below in order to generate a manifold of useful hierarchies. In 
summary we have four equivalent representations for stratified hierarchies associated to 
a partition with a dissimilarity index (see fig.??). The first is the partition itself with the 
dissimilarity index 5 between neighboring tiles. The second represents a topographic 
surface, in which a dam of 0 thickness separates adjacent tiles ; its height is equal to the 
distance 5 between the tiles ; this representation will help unifying all morphological 
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segmentation algorithms in terms of flooding. The third is the RAG and the last as the 
minimum spanning tree of the RAG. 

2.3 The Lattice of Hierarchies 

2.3.1 Supremum and Infimum of Two Hierarchies. Let A and B be two stratified 
hierarchies, with their associated distances: d _4 and dg. The following relation defines 
an order relation between the hierarchies: B < A ^ VC, D G 'P(E) (C, D) < 

ds {C,D) 

With this order relation the stratified hierarchies of V{E) form a complete lattice. 
The maximal element is the hierarchy having E as only element and the smallest hier- 
archy contains all one pixel sets like {x} 

The infimum of two hierarchies A and B is written A l\ B and is defined by its 
ultrametric distance = dj\\/ de- Its balls are defined by: Ball^AB(-^) p) = 

BalU(A:,p) ABallg(A:,p) 

The supremum of two hierarchies A and B is written AV B and is the smallest 
hierarchy larger than A and B ; as d^ A dg is not an ultrametric distance, d^vB is the 
subdominant ultrametric distance associated to d^ A dg. If A\, B\ and A\ V B\ are 
the partitions obtained by taking the balls of radius A in each of the three hierarchies, 
then the boundaries of A\ V B\ are all boundaries existing in both A\ and B\ . The 
infimum and supremum of two hierarchies are illustrated in fig.0 




Fig. 2. Two Hierachies HA and HB and their Derived Supremumn and Infimum. 



2.3.1 Lexicographic Fusion of Stratified Hierarchies. Let A and B be two strati- 
fied hierarchies, with their associated distances d _4 and ds- In some cases, one of the 
hierarchies correctly represents the image to segment, but with a too small number of 
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nested partitions. One desires to enrich the current ranking of regions as given by A, by 
introducing some intermediate levels in the hierarchy. The solution is to combine the 
hierarchy A with another hierarchy S in a lexicographic order. 

One produces the lexicographic hierarchy Lex(yl, B) by defining its ultrametric 
distance ; it is the largest ultrametric distance below the lexicographic distance g 
classically defined by 
dA.B {c, D) > d,A,B {K, L) 

dA{C,D)>dA{K,L) 

or 

(C, D) = dA {K, L) and dg (C, D) > dg {K, L) 

Fig0present two hierarchies H A and H B and the derived lexicographic hierarchies 
Lex{A, B) and Lex(;B, A) . 




Fig. 3. Two Hierachies HA and HB and their Derived Lexigraphic Combinations. 



3 Watershed and Floodings 

3.1 The Watershed 

We have now to describe how to derive hierarchies from images and how to use them 
for segmentation or hltering purposes. The answer will be flooding and watershed. The 
watershed line is a topographical entity ; a grey-tone image may indeed be considered as 
a topographical surface, where each pixel has an altitude proportional to its grey-tone. 
Let us now consider a drop of water falling on a topographic surface. If it falls outside a 
regional minimum, it will glide along a path of steepest descent until it reaches a mini- 
mum. The set of pixels drained by a given regional minimum forms the catchment basin 
of this minimum. The watershed line is the boundary between catchment basins. The 
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trajectory of a drop of water falling on the surface is a geodesic line of a distance called 
topographic distance, defined in O. If two pixels p and q belong to such a geodesic 
line, the topographic distance between them is equal to the difference of altitudes be- 
tween both pixels: | fp — fq |- Assigning to all regional minima the value 0 does not 
change the catchment basins, which may then be detected by a shortest distance algo- 
rithm: the catchment basin of a minimum is the set of pixels with a shorter topographic 
distance to this minimum than to any other minimum. In practice, a biased algorithm is 
used in order to create a partition: the pixels which are at an equal distance of two min- 
ima are assigned arbitrarily to one of the adjacent catchment basins. As a result of this 
choice, the union of the catchment basins forms a partition. This partition is the finest 
partition from which all other segmentations will be derived. In practice, an image is 
segmented by constructing the catchment basins of its gradient image, as illustrated in 
the following pictures. 




Image Gradient watershed 



As each regional minimum generates a catchment basin, the obtained partition is 
often fragmented in a lot of tiny regions. We obtain a coarser partition if we reduce 
the number of minima. Flooding a topographic surface is an efficient way to reduce the 
number of its regional minima: after partial flooding, some catchment basins will be 
completely flooded and be absorbed by neighboring catchment basin. The next section 
defines and presents the properties of morphological flooding. 

3.2 Definition of a Flooding 

Notation: we write gp for the value of the function g at pixel p. In what follows we 
consider that the domain E of the images is a discrete grid. 

Definition 5. A function g is a flooding of a function f if and only if g > / and for any 
couple of neighboring pixels (p, q): gp > gq ^ gp = fp 

Definition 6. Two pixels x, y belong to the same flat-zone of a function f if and only if 
there exists a n-tuple of pixels {pi,P 2 , ■■■,Pn) such that pi = x andpn = y, and for all 
i, (pi,pi+i) are neighbours and verify fp. = fp^+i- 

”To belong to the same flat-zone” is an equivalence relation, whose equivalence 
classes are precisely the flat-zones. 

Let p be a flooding of the function /. We call lake of g any flat-zone of g containing 
at least a pixel p for which fp > gp. Let L be such a lake. If all neighbors of L have a 
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Fig. 4. A; A physically possible flooding; B: An impossible flooding, where a lake is 
limited by a wall of water at position p. 



higher altitude, then L is a regional minimum. On the contrary, if L has a lower neighbor 
it is called full lake: there exists a couple of neighboring pixels {p, q), p belonging to 
L and pp > pg. According to the definition of floodings, this implies that pp = fp, 
meaning that the level of the flooding p and the level of the ground / are the same at 
pixel p ; hence the interpretation of the definition is simply that a lake cannot form a 
wall of water without solid ground in front to hold the water. This is clearly illustrated 
in figEl where the right figure cannot be a valid flooding, whereas the left figure is a 
valid one. The pixel p is then necessarily a pass point of p: the altitude of p decreases 
from p to the outside and the inside of the lake, and increases in both directions along 
the outside boundary of the lake (the altitude remains stable, if the lake has no higher 
neighbor). 

By considering all neighbors of a central pixel, one easily derives the following 
criterion from the definition of a Hooding. 

Criterion Flood: A function p is a Hooding of a function / if and only if p = f V ep, 
where ep is the erosion of p with a structuring element equal to the central pixel and its 
first neighbors. 

Remark: A flooding is a particular type of image Altering called leveling. There also 
exists a PDE implementation for constructing it. 



3.3 Properties of Floodings 



Creation of Lakes. Any flooding p of a function / creates a number of lakes on the 
topographic surface of /. All connected components where p > f are flat, as shows 
the following property immediately derived from the definition: 



for any couple of neighboring pixels (p, q) : 



9q > fq 
9p > fp 



9p ~ 9q 



Algebraic Properties. It is easy to check using their definition that: 

* If p and h are two floodings of /, then p V and p /\h also are floodings of / 

* If p and h are floodings of / and p > h then p is also a flooding of h.As a matter of 
fact, if is a flooding of / then h>f. But p ^ h. Hence p >h>f. 

On the other hand, p is a flooding of /, implying V {p, q) neighbors: Pp > Pq 
fp = 9p I^ut since p > h > f, this implies pp = hp 
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* The relation {gis a flooding of /} is reflexive, antisymmetric and transitive; it is an 
order relation. 

In particular, if / and h are two functions such that f < h, then the family of floodings 
((7®) of / verifying <h form a complete lattice for this order relation. The smallest 
element is / itself. The largest is called flooding of / constrained by h (see fig.l^l. 
It is obtained by repeating the geodesic erosion of h above / : = / V 

until stability, that is until The criterion Flood given above shows that 

the result at convergence effectively is a flooding of /. Convergence may be obtained 
faster when using a recursive or a data driven implementation of the algorithm using 
hierarchical queues @. This operation also is known as reconstruction closing of / 
using h as marker. 



g=Fi(f,h) 

nvimw 



Fig. 5. Fl(/, h) is the flooding of g (blue function) constrained by the function h (red 
function). 



* These properties permit various constructions of increasing families of floodings (g *) : 
it is necessary and sufficient that g^ is a flooding .of g^~^. 

Construction of Floodings. 

Uniform Flooding. A flooding of a function / is uniform if the level of all lakes is the 
same everywhere. A flooding at level A is obtained simply by threshold: 

rx _ f a f > X 
•' A if / < A 

Flooding with a Hierarchical Queue. A hierarchical queue is the ideal tool to imple- 
ment floodings El|i 5 |. A hierarchical queue is a series of queues (working on a first in 
first out basis), each with a priority level ; lower altitudes meaning higher priority. A 
pixel entering the hierarchical queue is put in the queue with an altitude corresponding 
to its priority. Only one pixel is able to leave the queue at a time: it is the first pixel who 
entered the queue with highest priority. If we want to flood a surface / from a source 
X (a subset of E), we put all outside neighbors of X in the hierarchical queue. Each 
pixel leaving the queue is immediately flooded and all its unflooded neighbors are put 
into the queue. If the altitude of such a neighbor is below the current flooding level, 
the lake has attained the level of the smallest pass point on the boundary of the current 
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catchment basin. If we stop after emptying the current queue, we will have produced a 
lake which is called full lake, as it is not a regional minimum anymore ; continuing the 
flooding will produce an overflood into the neighboring catchment basin. 

Hierarchical queues naturally produce a flooding: lower pixels are flooded before 
higher pixels ; within a plateau, pixels are processed in the order of their distance to the 
lower border of the plateau. The same flooding mechanism is used for the construction 
of the watershed of a function /: 

(a) Initialisation: give a label to each regional minimum put its inside boundary 
pixels in the hierarchical queue. 

(b) If all queues are empty: END. Else, take a node j in the queue of lowest priority. 

(c) For each unlabeled neighbouring node i of j do: 

label(i) = label(j); 

put i in the queue with priority p{i) ; 

return to (b) ; 



4 Hierarchy Associated to an Ordered Series of Floodings 

4.1 Watershed and Floodings: Absorption of Catchment Basins During 
Flooding 

If 5 is a flooding of /, how do the catchment basins of g relate to those of / ? We will 
call CB f the catchment basins of / and CBg those of g. Let rrii be a regional minimum 
of / and Yi the associated CBy . Two cases are possible. 1) rtii is not covered by a lake. 
rrii is then also a regional minimum of g and Yi is included in a CBg, which may have 
absorbed some full lakes of g 2) mi is covered by a lake L. Then if L is a regional 
minimum of g, Yi is included in the CBg of L. On the contrary, if L is not a regional 
minimum of g, then L is a full lake and belongs to the same catchment basin as its lower 
neighbors. This analysis shows that in all cases each CB f is included in a CBg. Hence 
the partition of CB f is finer than the partition of the CBg. 

4.2 Hierarchy of the Catchment Basins 

Let us now consider a family of increasing floodings (g®) of function /, verifying 
< g^ for i < j, and g° = f. As we have seen earlier, g^ is then a flooding of 
for i < j. According to the previous section, the partition of CB gi is finer than the 
partition of the CBgj . The catchment basins of the family (g*) are nested and form a 
hierarchy, for which we now characterise the associated ultrametric distance d g and the 
minimum spanning tree Tg. Consider two catchment basins of /, ATi and associated 
to the minima mi and m 2 . As we have seen in section 2.2.2, the ultrametric distance 
dg (Xi,X 2 ) between them is the smallest index k such that Xi and X 2 belong to the 
same catchment basin of g^ within the family (g*) ; furthermore the hierarchy can be 
represented by a MST Tg . 

We will now state and establish an important property of all flooding hierarchies of 
a given function /. 

Proposition: All minimum spanning trees representing a hierarchy associated to a se- 
ries of increasing floodings of a given function / have the same nodes and edges ; they 
only differ by the distribution of weights. 
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In order to establish this result, we first characterize the minimum spanning tree T f 
associated to the hierarchy produced by uniform flooding. We call T the unweighted 
tree having the same nodes and edges as Tf. We then come back to the general case 
of an arbitrary family of increasing floodings {g * ) and show how a new distribution of 
weights may be found on T in order to generate the hierarchy associated to {g *). 



4.2.1 Hierarchy Associated to Uniform Flooding. Let G be the RAG of the topo- 
graphic surface of /: its nodes are the CB f, neighboring nodes are connected by an 
edge with a weight equal to the altitude of the smallest pass point between them. Be- 
tween two minima mi and m 2 of the topographic surface /, there exist many paths ; 
among them there exists a path with lowest sup-section ; its highest point is a pass point 
p of f with altitude fp. This path crosses a number of catchment basins of /, whose 
nodes also form a path of highest sup-section on the RAG G. Hence this path belongs 
to the minimum spanning tree Tf of G. 

We now consider a uniform flooding of the function /, that is the family / ^ defined 
in section 3.3. The lowest level of flooding for which both minima m 1 and m 2 will 
be covered by a same lake and the associated catchment basins X 1 and X 2 be merged 
is then obviously equal to fp, the highest altitude on the path with lowest sup-section 
between Xi and X2- This shows that the MST Tf effectively represents the hierarchy 
associated to uniform flooding. 



4.2.2 Hierarchy Associated to Arbitrary Floodings. Calling T the unweighted tree 
having the same nodes and edges as Tf, we now show that for an arbitrary series of 
increasing floodings (g*) it is possible to find a distribution of weights on the edges of 
T, such that it represents the hierarchy associated to (g*). This means that suppressing 
all edges of T with a weight superior to k produces a number of subtrees, representing 
each a catchment basin of g^ . 

We first assign to ah edges of T a weight equal to 00 . T is then able to represent the 
catchment basins of the zero flooding go = f suppressing all edges of the tree T with 
a positive weight creates isolated nodes, each of them representing a catchment basin 
of /. Let us now suppose that we have found a distribution of weights on T such that 
all hierarchies up to level i can be represented. 

Let us show which weights have to be modified such that T also represents the 
catchment basins of Suppose that X and Y are catchment basins of (/*, which 
merge into a unique catchment basin Z of By hypothesis, X and Y can be cor- 
rectly represented by the tree T. In order to show that it is also the case of Z, .we 
imagine a series of progressive floodings which transform g'' into such that pro- 
gressive fusions of catchment hasins in this series of floodings can all he represented hy 
edges of T: 

1) Initialisation: A: = 0 ; = (/*; 

2) chose a regional minimum W of such that {W) < {W). 

* If such a minimum is found, flood the corresponding catchment basin until either 
(a) the level of {W) is reached or (h) until a full lake is created. As result we get 

a new flooding of /. In case (a) and have the same catchment hasins. 

In case (h) the full lake covering W has reached a pass point which corresponds to an 
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edge of T ; we assign to this edge a weight equal to (i + 1) Like that the evolution of 
catchment basins between and can be expressed by the tree T. 

* If there is none, this means that and we may exit, having produced a 

distribution of weights on T correctly representing 
3) do k = k + 1 and go to (2). 

The fact that a unique spanning tree is able to generate all hierarchies associated to 
increasing floodings is an important factor of speed and simplicity for morphological 
segmentation algorithms: it is sufficient to construct the tree T once and derive from 
it all morphological segmentation results, just by changing its weights. Furthermore, T 
represents a huge reduction in the amount of information to process: it is dimensionless 
whatever the number of dimensions of the domain E and each of its nodes represents a 
whole catchment basin of /. Being a tree, it has TV — 1 edges for N nodes. 

4.3 Useful Families of Floodings 

We now have to indicate the principal and most useful families of floodings used in 
morphological and multiscale segmentations. As a matter of fact, the quality of seg- 
mentation will depend to a great extent on the family of floodings on which it is build. 
We already presented the uniform flooding, which is the simplest. 

4.3.1 Size Oriented Flooding. Size oriented flooding may be visualised as a process 
where sources are placed at each minimum of a topographic surface and pour water 
in such a way that all lakes share some common measure (height, volume or area of 
the surface). As the flooding proceeds, some lakes eventually become full lakes, as the 
level of the lowest pass point has been reached. Let L be such a full lake. The source 
of L stops pouring water and its lake is absorbed by a neighboring catchment basin X, 
where an active source is still present. Later the lake present in X will reach the same 
level as L, both lakes merge and continue growing together. Finally only one source 
remains active until the whole topographical surface is flooded. The series of floodings 
indexed by the measure of the lakes generates a size oriented hierarchy. 

In flgl^ a flooding starts from all minima in such a way that all lakes always have 
uniform depth, as long as they are not full. The resulting hierarchy is called dynamics 
in case of depth driven flooding and has first been introduced by M.Grimaud[E||. Deep 
catchment basins represent objects which are contrasted ; such objects will take long 
before being absorbed by a neighboring catchment basin. The most contrasted one will 
absorb all others. This criterion obviously takes only the contrast of the objects into 
account and not their size. If we control the flooding by the area or the volume of 
the lakes, the size of the objects also is taken into consideration [|H1; in multimedia 
applications, good results are often obtained by using as measure the volume of the 
lakes, as if each source would pour water with a constant flow This is illustrated by the 
following figures. The topographical surface to be flooded is a colour gradient of the 
initial image (maximum of the morphological gradients computed in each of the R, G 
and B colour channels). Synchronous volumic flooding has been used, and 3 levels of 
fusions have been represented, corresponding respectively to 15, 35 and 60 regions. 

As a summary, the depth criterion ranks the region according to their contrast, the 
area according to their size and the volume offers a nice balance between size and con- 
trast as illustrated in flgQ where we have illustrated the differences between the criteria 
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Fig. 6. Example of a height synchronous flooding. Four levels of flooding are illustrated; 
each of them is topped by a figuration of the corresponding catchment basins. 




Intial 15 regions 35 regions 60 regions 



used for controling the progression of the lakes. The initial image and its gradient are 
illustrated on the top row. Then 3 types of synchronous flooding are used. In the first 
(bottom left) the lakes grow with uniform depth, resulting in a pyramid where the most 
contrasted regions survive longest. In the second (bottom center) the area is used and 
the largest regions are favoured. In the last (bottom right) the volume of the lakes is 
used, offering a good balance between size and contrast of the regions. For each hierar- 
chy the partition with 70 tiles is selected and each tile replaced by its mean grey tone, 
for the purpose of illustration. 

4.3.2 Tailored Flooding for Favoring Some Types of Regions. In some cases, while 
using one of the size criteria, it appears desirable to favor some regions. This happens if 
one knows beforehand that regions with some particular characteristics are important. 
As an example: in many cases, the topographic surface to be flooded is the gradient 
image dh of an image h. The catchment basins of dh correspond to flat zones in h, 
which may be regional minima, maxima or step zones. However minima and maxima 
of h are perceptually more important than transition flat zones. For this reason, it may 
be worthwhile to push minima and maxima of h higher in the hierarchy. 

It is easy to obtain this result during synchronous flooding: by reducing the rate 
of flow in the corresponding minima. The more important a region is, the more the 



174 



Fernand Meyer 




Fig. 7. Top: Initial Image and gradient image 

Bottom: 3 partitions with 70 regions each. 3 different geometric criteria have been used 
during synchronous flooding: on the left, the depth of the lakes, in the centre the area 
and on the left the volume of the lakes. 




Fig. 8. 4 levels of tailored synchronous flooding, where the minimum marked red is 
slowed down by a factor 5. As a result we show the corresponding segmentation into 3 
regions compared to the segmentation in 3 regions if no source is slowed down. 
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flow of its minima has to be reduced. In figJHI we have a case where depth synchronous 
flooding is performed. However the depth of the minimum marked by a black bar grows 
five times slower than the depth in the other catchment basins. For this reason, this 
particular minimum survives much longer any absorption. The interest of such markers 
is clearly illustrated in fig0 where two segmentations without and with slowing down 
the flooding are compared. A fine partition is created first ; the color flat zones are 
detected and the largest of them serve as markers for flooding a colour gradient image 
(top right picture). Then a second gradient image is constructed on the boundaries of 
the fine partition and this new image is flooded according volumic criteria. The result is 
illustrated by the bottom pictures. On the left, the rate of flood is the same in all minima, 
on the right, regiona have been selected by hand in the faces of the angels, and their rate 
of flow reduced by a factor 50. Then 2 partitions have been selected in the hierarchy 
with the same number of regions showing that the faces of the angels merge with the 
background if their flooding is not slowed down. 



4.3.3 Flooding in the Presence of Markers. Markers are a limit case of the preceding 
situtation. One wishes that the marked regions are present at the top of the hierarchy. 
This will be the case if the rate of flow in the marked minima is infinitely slowed down ; 
in other terms such minima have no source at all. Hence they stay minima for ever, and 
catch their neighboring basins as illustrated in fig. Da If there are N minima, cutting the 
— 1 highest edges of the MST yields a partition of N regions, containing a marker 
each. Cutting more than — 1 edges shows how the regions are further subdivided in 
finer segmentations ; in this case, the criterion used for controling the flooding (depth, 
area or volume of the lakes) has an effect on the finer segmentations. 

It is interesting to observe the resulting flooding at convergence; the only remaining 
minima are the marked minima, all others are full lakes. Let us consider two marked 
minima mi and m 2 for which the resulting catchment basins Xi and X 2 are neighbors. 
The lowest pass point between Xi and X 2 corresponds to an edge of the minimum 
spanning tree Tf of the RAG ; more precisely it is the edge with the highest valuation 
in the unique path joining m 1 and m 2 within Tf. From this we infer another method of 
segmenting with markers: cut the edge with the highest weight on the MST T / between 
any couple of markers. Suppressing on each of these paths the edge with highest weight 
results produces a minimum spanning forest, where each tree is rooted in a marker. This 
method of segmentation with markers is extremely powerful as it only uses the mini- 
mum spanning tree. Of course, the minimum spanning tree representing the hierarchy 
of any series of increasing floodings can be used. Traditionaly morphological segmen- 
tation uses the hierarchy of uniform flooding ; however in some cases better results are 
obtained by using another hierarchy, for instance a size oriented hierarchy. This opens 
the way to a new segmentation mode with markers: in a first stage construct a hierar- 
chy which is well adapted to the problem ; for instance use volumic size flooding, in 
order to obtain a hierarchy which better represents the relative importance of the dif- 
ferent regions. And in a second stage, use the set of weights of this new hierarchy for 
segmenting with markers. 

Finally, size oriented flooding, tailored flooding and flooding with markers may be 
regrouped: each minimum may be considered as a fuzzy marker, by assigning to it a 
fuzzy level: 1 means a hard marker, where no source is placed ; 0 means no marker at 
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Fig. 9. Top row: Initial image and fine segmentation 

Bottom row: Segmentation without and with fuzzy markers placed in the faces of the 
angels. Both partitions have the same number of regions. 



all, and the source is not slowed down ; A means a fuzzy marker, and the corresponding 
source is slowed down by a factor A. Fuzzy markers permit to establish a continuum 
between traditional multiscale segmentation and segmentation with markers. 

4.3.4 Cataclysmic Floodings. A flooding p of a function / is cataclysmic if each 
catchment basin of / is occupied by a full lake. Some of these lakes are regional minima 
of g ; others are not. The catchment basins of g constitute the first level of the hierarchy 
( see figini)- The resulting function g itself may then be submitted to a new cataclysmic 
flooding and again the number of catchment basins will be strongly reduced. Repeating 
this flooding in sequence a few times generally produces an image where only one 
region remains. 

A cataclysmic flooding of an image / is easy to produce through a constrained 
flooding. The constraining function is equal to / on the watershed line of / and equal 
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Fig. 10. Flooding in the presence of markers. The catchment basins with markers have 
no source at all. The arrows show in which order the catchment basins are absorbed one 
by another. 




to oo everywhere else. The process is illustrated m 1 dimension m fig. 11 2land also for a 
single basin in 2 dimensions in figH3 




Fig. 12. Constrained flooding for producing a cataclysmic flooding. 



Repeating the same extremal flooding on the result of the first extremal flooding 
again will drastically reduce the number of catchment basins. This process may then be 
repeated until a partition is created with only one catchment basin. Like that we obtain 
a series of nested partitions which decreases extremely rapidly. 
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Fig. 13. Cataclysmic flooding: the constraining function is equal to 0 on the watershed 
line and to 17 everywhere else. 



5 Application to Segmentation 

Segmenting an image is an extremely difficult task. Using hierarchies offers a great 
help: a hierarchy offers a set of possible contours for the segmentation to construct. 
Instead of searching the contours blindly among all pixels of the image, segmenting 
through a hierarchy will select only contours present in one of the nested partition 
present in the hierarchy. Furthermore, each contour is weighted by the rank of the finest 
partition in which it does not exist anymore. For this reason, a number of segmentation 
tasks may be expressed as finding the strongest contours between a set of markers, or 
finding the best segmentation in a number of predefined regions. Hierarchical segmen- 
tation offers also new ways for interactive segmentation: For each part of the image, one 
is able to chose the scale for which the segmentation is the most easy. For this reason 
it is important to be able to construct extremely diverse hierarchies in order to offer for 
each segmentation problem the hierarchy which presents the best scale of contours : one 
may favour the contrast of the regions, their colour, their size. One may also, through 
tailored flooding, favour some regions compared to others, as being more important. 
We present now the most frequent segmentation scenarios. 



5.1 Unsupervised Segmentation 

The primary aim of a hierarchical approach is to be able to easily produce segmentations 
with an arbitrary number of regions. To the question ’’what is the best segmentation 
into n regions”, the answer is easy. One produces through flooding the gradient of the 
image a hierarchy and choses the stratification index for which a partition with n tiles 
is produced. Such situations are met in object oriented coding applications: the image 
is encoded as a partition. To each degree of compression corresponds a partition with 
a given number of regions. Such an encoder must be able to segment an image in any 
number of regions. 
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5.2 Segmentation with Markers 

Flooding with markers is the most used morphological segmentation method [d |d, 
as it permits an easy and robust way to introduce semantics into the segmentation pro- 
cess: the objects to segment are first recognised and a marker is produced. It may be the 
segmentation produced in the preceding frame when one has to track an object in a se- 
quence. It may also be some markers produced either by hand or automatically. Position 
and shape of the marker have no importance, yielding robustness. In a second phase the 
contours are found. We have seen earlier how to flood a topographic surface in presence 
of markers. The analysis of the result has shown that segmenting with markers amount 
to suppressing the highest edge on the unique path linking two markers within the MST 
T. Interactive situation may be implemented through markers: a first set of markers is 
defined by hand or automatically. The corresponding segmentation is constructed. Then 
the result is corrected by manually editing the set of markers. The update of the partition 
after each edition may be done without a new flooding. 

As we have seen in section 4.3.3, segmenting with markers amouts to construct a 
minimum spanning forest, by cutting from T the edges with the highest weights be- 
tween all couples of markers. An added marker rrii will belong to a tree of this forest, 
which is rooted in a marker m k ■ Within this tree, there exists a unique path between 
TOfc and rrii ; cutting the highest edge on this path yields the new segmentation. Alterna- 
tively, suppressing a marker nik means assigning the corresponding tree to an adjacent 
tree, by adding to it the edge of T with lowest weight, leading to one of the neighboring 
trees. 

5.3 Interactive Segmentation 

Besides the traditional segmentation technique based on markers, new interactive seg- 
mentation techniques based on hierarchies are under development [El. A hierarchy is 
constructed and explored with a pointing device such as a mouse. A mouse position is 
defined by its (a;, y) coordinates in the image but also by its depth z in the segmentation 
tree. If the mouse is active, the whole tile containing the cursor is activated and added 
or suppressed from the segmentation mask. For the same (x, y) position, a mouse dis- 
placement towards lower levels of the hierarchy will result in a resegmentation of the 
region, whereas a displacement towards higher levels represents a fusion of adjacent re- 
gions. The desired segmentation is constructed as a painting process, in which the brush 
adapts its shape to the boundaries of the object: higher level of the hierarchy produce a 
larger brush, lower levels a smaller brush (see fig. 



6 Hierarchies and Filtering 

Each morphological notion has a dual counterpart. Let us consider a grey tone image 
/. If we negate /, flood — / and again negate the result, we will have suppressed some 
peaks of the topographic surface. This dual operation of flooding is called razing. To 
the constrained flooding presented earlier Fl(/, h) corresponds a constrained razing 
Rz(/, h). Similarly, if we construct the watershed line Wsh of the function — / and 
negate the result, we obtain the thalweg line of /. As we have seen earlier, cataclysmic 
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Fig. 14. Top row: a) initial image ; b) and c) two levels of the hierarchy associated to 
volumic flooding. 

Bottom row: An initial partition of the hierarchy is selected. Some of its regions are 
split and others merged until the desired result is obtained. 



flooding is the largest flooding of / below the constraining function equal to / on the 
watershed line of / and equal to the maximal value f? everywhere else. After such a 
cataclysmic flooding, all CB contain a full lake. The dual operator would be cataclysmic 
razing. 




Fig. 15. Cataclysmic flooding and razing for filtering a topographic surface. 



FigO presents a CB in which a full lake has taken place. The boundary of this 
CB is a portion of the watershed line on which 3 regional maxima may be seen. These 
regional maxima are separated by portions of the thalweg line. As illustrated, the wa- 
tershed line and the thalweg line intersect at the position of saddle points. The smallest 
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saddle point on the boundary of a catchment basin becomes the level of the full lake 
whereas the highest saddle point on the piece of thalweg surrounding a regional max- 
imum becomes the level of the full razing. So if a regional minimum and a regional 
maximum have a saddle point s in common then the full lake filling this regional mini- 
mum will be below or equal to fs and the level of the full razing lowering the regional 
maximum will be higher than /g. If a regional minimum and a regional maximum are 
neighbors but have no common saddle point, then the level of flooding in one and razing 
in the other are not coupled; it may then happen that the the altitude of a pixel should 
increase during cataclysmic flooding and at the same time decrease if a cataclysmic raz- 
ing is done in parallel. For avoiding such situations we perform a cataclysmic flooding 
followed by a cataclysmic razing. Or alternatively a cataclysmic razing followed by a 
cataclysmic flooding. Both possibilities give the same result in all parts of the image 
were the watershed and thalweg lines cross. In we present 5 successice steps of 
simplification of the same image. In order to have a better control on the result, one may 
wish avoiding cataclysms; each step of cataclysmic razing fills a number of full lakes 
and suppresses a number of full blobs ; one may select a number of them for a reintro- 
duction in the image. For instance, one may wish to keep the most contrasted ones and 
only suppress the smaller ones, corresponding to noise or less significant details. Other 
criteria, like area or volume of blobs and lakes also may be used. 



7 Conclusion 

Hierarchies of nested partitions offer a powerful representation of images as they con- 
vey all potential contours for a given segmentation task. We have established the link 
between floodings and hierarchies through the watershed transform. All traditional mor- 
phological segmentation methods may be expressed in this framework and new ones 
may be devised, as we have seen with interactive segmentation. As segmenting an im- 
age is extremely difficult, it is important to have a large toolbox of tools available. We 
do not pretend being able to segment every image. However we have large degrees of 
freedom in the choice of the right tools for a given task; choice of the best hierarchy, in- 
troduction of extraneous knowledge or semantics through fuzzy markers and possibility 
to implement a versatile mode of interaction. 
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Abstract. The main theorem we present is a version of a “Folklore 
Theorem” from scale-space theory for nonnegative compactly supported 
functions from R" to R. The theorem states that, if we take the scale in 
scale-space sufficiently large, the Gaussian-blurred function has only one 
spatial critical extremum, a maximum, and no other critical points. 

Two other interesting results concerning nonnegative compactly sup- 
ported functions, we obtain are: 

1. a sharp estimate, in terms of the radius of the support, of the scale 
after which the set of critical points consists of a single maximum; 

2. all critical points reside in the convex closure of the support of the 
function. 

These results show, for example, that all catastrophes take place within a 
certain compact domain determined by the support of the initial function 
and the estimate mentioned in 1. 

To illustrate that the restriction of nonnegativity and compact support 
cannot be dropped, we give some examples of functions that fail to sat- 
isfy the theorem, when at least one assumption is dropped. 

Keywords and Phrases. Large-scale behavior, loss of detail, nonneg- 
ative function, compact support, spatial critical point, deep structure. 



1 Introduction 

In this paper we discuss a “Folklore Theorem” from scale-space that could be 
roughly stated as follows: if a function / : K" — > K is blurred sufficiently, then 
the blurred function has a single critical point, which is an extremum. The in- 
tuitive idea behind this theorem is appealing: if scale is taken sufficiently large, 
every detail of the function will be lost (as scale increases “images become less 
articulated”, one sees the “erosion of structure” as Koenderink calls it 0), be- 
cause they are too small with respect to the scale. All that is left is a blurred 
function virtually indistinguishable from a single Gaussian blob. This theorem, 
however, is false. The principal aim of this article is the formulation of certain 
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restrictions on the functions considered, that leads to the Folklore Theorem to 
hold. 

As we will see in Subsection Id. 1 1 the falseness is already shown by a simple 
example like the nth order (n > 0) derivative of a one-dimensional Gaussian, or 
an arbitrary periodic function. 

The theorem we present (Subsection Id. 211 states that for nonnegative func- 
tions / : M" — > R>o (where M>o stands for the set of nonnegative real numbers) 
with compact support, it does hold that which is / at scale a, has only one 
critical point for all cr larger than a certain scale c. To show that the restriction 
of nonnegativity or compact support cannot be dropped in general, we discuss 
in Subsection Id ..‘11 some examples which illustrate this. Especially the example 
in which we retain the compact support, but drop the nonnegativity is counter- 
intuitive and an interesting observation. 

A further result we present is a sharp estimate of the scale in terms of the 
radius of the support, after which we certainly have a unique maximum. This, 
together with a result presented in Sectional which states that all spatial critical 
points of a blurred function reside in the convex closure of the support of the 
initial function, gives us a certain bounded domain within scale-space to which 
attention can be restricted, when for example, tracking down catastrophes 
finger-prints jSj , or other potentially interesting features |3| . 

Most of the results are presented and proven in quite a formal way. How- 
ever, the proofs of these results are postponed to Appendix IXI to keep the text 
more readable. The next section gives some definitions and notations, which are 
used in the remainder of the article. Section El provides the discussion and the 
conclusions. 

2 Definitions and Notations 

We start with the formal definitions of the support of a function, and the radius 
of the support. 

Definition 1. The support supp(/) of a function f : M” ^ K js 



where the line indicates that the closure of the set is taken. The radius r(/) of the 
support of the function f , is defined as the smallest radius of a ball containing 
supp(/). 

We also define Gaussian blurring of a function from which scale-space is gener- 
ated. 

Definition 2. Let the Gaussian kernel g„ '■ R" ^ R, for a G R>o, be defined 



Furthermore, given a function f : R" — *■ R, let the function f„ : R” — > R &e 
defined as: 



supp(/) := {x e R"|/(a;) yf 0} 



as: 
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to which we refer as the blurred function f„ at a fixed scale a. Scale-space is 
defined as the complete family of functions fa-, with a € K>o- 



3 Large-Scale Behavior of Critical Points 

This section presents the theorem that states that for nonnegative functions / 
with compact support, it holds that fa has only one critical point for all cr larger 
than a certain scale c. However, we first discuss shortly two simple examples, 
which show that the theorem cannot hold for general (integrable) functions. 

3.1 Arbitrary Number of Critical Points 

Example 1. Derivatives of the normalized Gaussian function define an autocon- 
volution algebra, i.e., if Dk denotes a partial derivative operator with multi-index 
order, k = {ki, . . . , kn), then DkPa * Dipr = Dk+ig^ a^+r'^ - In particular this im- 
plies that derivatives of a normalized Gaussian are, topologically speaking, blur 
invariant, hence their critical points are preserved regardless of the amount of 
blurring. 



Example 2. As a second example, note that sin((w,a:)) and cos{{uj,x)), with 
u!,x G M" are eigenfunctions under Gaussian blurring with eigenvalue ll“ll . 
Thus the same conclusion can be drawn regarding their (infinite number of) 
critical points as in the previous example: the number of critical points remains 
equal for all scales and is infinite. Gf. for an analysis of the behavior of 
the extrema at large scale, in which functions are considered that are periodic, 
band-limited, and one-dimensional. 

Glearly, in general, the Folklore Theorem does not hold. Moreover, both 
examples in this section show that we can construct functions with an arbitrary 
number, ranging from one to infinite, of extrema at all scales. Hence, we look 
for restrictions on a function, that do lead to the behavior that only a single 
large-scale extremum exists beyond scale, say, <j. As Example 0 shows, requiring 
the function to satisfy: lim||3,||^oc f{x) = 0 is not enough; it is still possible that 
an arbitrary number of critical points coexist to an arbitrarily large scale. 



3.2 Nonnegative and Compactly Snpported Ftmctions 

If we restrict the functions to compactly supported functions, it seems reasonable 
to expect the Folklore Theorem. Because the function is compactly supported, 
its support is bounded and has a maximal radius, say r, so, one expects that for 
scales that are large enough - proportional to r - there is not much more left of 
the blurred function then something similar to a Gaussian, in which one can no 
longer distinguish any kind of detail. However, as we will see in Subsection roi 
this restriction is still too weak to make the theorem hold. We need an extra 
restriction, which is the requirement that the function is nonnegative. 
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Theorem 1. Given a nonnegative function f : K” — > K with compact support. 
V r(/) equals r, then f„ has exactly one critical point for every a > r. This 
unique critical point is a maximum. 

Refer to Appendix 1X1 for the proof. 

Remark 1. Without further proof, we mention that the foregoing estimate is 
sharp, in the sense that given a support with radius r, it is possible to construct a 
nonnegative, compactly supported function / such that fa has multiple extrema 
for all a arbitrarily close to r. The idea is to construct a nonnegative function 
/ that is compactly supported with r(/) = r, and comes arbitrarily close to the 
sum of two Dirac functions on R" separated by a distance equal to 2r. 

The result stated in Theorem^ in combination with Remark ^ suggests that we 
should call 2 (t, and not just cr, the scale of fa at which / is observed (cf. [HI7l8j ) . 

Remark 2. Under much weaker assumptions, one has the following weaker result, 
which however still might be useful. The proof, of which we do not give the details 
here, starts with an estimate from below for the quantity J f(y) ga{x — y) dy, 
which appears in the denominator of the mapping F in the proof of Theorem [0 
In the estimate, we use the assumption that x € Bsa. 

Theorem. Assume that ||y|P|/(y)| dy < oo and j^.^f{y)dy > 0 (the case 
that J^„f(y)dy < 0 can be treated analogously). Let denote the ball with 
radius r and center at the origin. Then there exists a 6 > 0 and a ag > 0 such 
that for every a > ag the restriction of fa to Bsa has precisely one critical point 
f{cr), which is a maximum. Furthermore, if u ^ oo, then £,{cr) converges to 

Imr, yf{y)dy 

fun f{y)dy ' 

In the next subsection it is shown that the restrictions on the function cannot 
be dropped in general. 



3.3 Nonnegative or Compactly Supported Functions 

Dropping one of the requirements on the function in Theorem ^ can lead to 
examples of functions for which the theorem does not hold. We present three ex- 
amples, which illustrate this. The first two are straightforward and are presented 
without rigorous proof. For the third one, it is demonstrated that the theorem 
does not hold. 

Note that Examples Q] and |2| of Subsection rm illustrate what can happen 
when both assumptions are dropped. The first and third example we present in 
this subsection do not require the initial function to be compactly supported. 
However, what we do require in the third example is that if ||a;|| goes to infinity 
that the value of the function goes to zero. In the second example we drop the 
nonnegativity assumption. 
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Example 3. As in Example 0 we consider periodic functions, however now we 
require the functions to be nonnegative. Hence, consider a bounded periodic 
function, and add a suitable constant so as to make it nonnegative. Note that a 
constant function is blur-invariant, and that its addition has no affect on critical 
points. Thus even positive functions may have multiple (in this case infinitely 
many) critical points that survive at all scales. 

Example 4- In this example, we show that things can go wrong when dropping 
the nonnegativity assumption, but not the compactness requirement. We think 
that this result is quite counter-intuitive, because one would expect that if the 
support of the initial function is bounded, this restricts the size of the possible 
details that can be present in a function (e.g. an image) and so there should be 
a finite scale cr for which fa- consists of one Gaussian- like blob. Hence fa should 
have a unique extremum. 

However, it is quite easy to construct compactly supported functions that 
do not satisfy Theorem 0 To do so, let / be nonnegative, compactly supported, 
n-times differentiable, and one-dimensional. First, note that all derivatives of 
/ are also compactly supported. Now, because / satisfies the assumptions of 
the theorem and hence has a single maximum for large scales, the nth-order 
derivatives still has, at least, n -I- 1 extrema for large scales. 

Remark 3. Most images reflect a certain physical spatial and/or temporal mea- 
surements, and most of these measured physical quantities are nonnegative, be- 
cause there is some clear absolute zero. So, there seems to be, at least in practice, 
hardly any loss of generality in restricting the class of admissible functions to 
nonnegative functions. However, quit often we do not study the function itself, 
but for example its first or second order derivative, or its Laplacian, and hence 
there are multiple critical points at large scales in general. 

Example 5. A last example we give, is the function, / := * 9 s 

for some s > 0, here 62 ^ := 6 {- — 2^), which is the Dirac function situated in 
2^. Interesting about this function / is that its first mth-order moments, with 
m < p. are finite, i.e., x"^f{x) dx < 00 for all m < p, and / goes to zero when 

I a:: I goes to infinity. 

These two conditions do not hold for the function from Example 01 but do 
hold for the compactly supported functions in Theorem 0 One could think of 
these conditions as restrictions on the speed of growth of the function, and one 
might expect that these requirements are sufficient for the theorem to hold. In 
Appendix m however, it is demonstrated that this nonnegative function has 
more than one extremum for large scales cr € K>q. 



4 Spatial Constraints for Critical Points 

In Section 0 we gave an explicit estimate of a scale c beyond which there is 
only one spatial maximum for fa when cr > c. This gives a lower bound for 
the scale beyond which it might be needless to analyze the function fa any 
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further, because further blurring does not bring further topological changes to 
the function as it remains a single blob upon succussive blurring. 

In this section we present a result, Theorem El that also puts spatial con- 
straints on the region of interest of scale-space. Again, it is required for the 
theorem to hold that the function is nonnegative. Moreover, the result becomes 
really useful, when the function is also compactly supported, which is exemplified 
by Corollary ^ 

Theorem 2. Let the function f be nonnegative, then every spatial critical point 
of fa- is in the closure of the convex hull o/supp(/). 

Refer to Appendix E] for the proof. 

As a direct consequence of Theorems d and El we can, for example, restrict 
the region in scale-space that would be of interest when we want to track down 
toppoints (i.e. scale-space catastrophes {x,a) G M" x R>o for which 



see m) of compactly supported, nonnegative functions /. 

Corollary 1. For a nonnegative function f with compact support, all toppoints 
in scale-space are situated in C x [0,r(/)] C M" x R>o, where C is the convex 
closure o/supp(/). 

5 Conclusions and Discussion 

This article presented two theorems which, for example, enable us to restrict the 
interesting region in scale-space, with respect to toppoints. 

The first, and main, theorem states that if a nonnegative function with com- 
pact support is blurred sufficiently, there is a scale for which any further blurring 
does not give rise to a topological change: the function is a single blob, and re- 
mains a single blob upon succussive blurring. This is a particular version of, what 
we called a “Folklore Theorem” : if a function / : K" ^ K is blurred sufficiently, 
then the blurred function has a single critical point, which is an extremum. 

This result might be clear intuitively, because the function has only a bounded 
domain where one can distinguish details (or structure d), and so one expects 
that if scale is taken large enough every single detail is lost. However, based 
on the same reasoning, we could argue that the theorem is even true for not 
necessarily nonnegative functions; a conjecture that has to be rejected as Exam- 
ple El shows us. Hence the Folklore Theorem does not hold in all its generality. 
(Besides Example 0, we discussed some other examples, which show that the 
Folklore Theorem is not true in general.) 

Furthermore, in the same theorem, we give a sharp estimate of the scale 
beyond which there are no more details distinguishable in the blurred function. 
This scale is equal to r, where r equals r(/), the radius of the support of the 
function nonnegative /. This, in combination with Remark Q suggests that we 
should call 2cr the scale of fa at which a function / is observed. 
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The second theorem presented spatial constraints for the critical points. This 
result is complementary to Theorem ^ which can be interpreted as imposing 
scale constraints in scale-space (cf. Corollary [IJ , but now the constraints are in 
the spatial domain. Theorem 0 states that, for every nonnegative, compactly 
supported function / every spatial critical point of fa- is in the closure of the 
convex hull of the support of the initial function /. 

We conclude that the scale-space constraints we considered in this article 
are all based merely on the knowledge that the functions under consideration 
are nonnegative and compactly supported. Other function classes, for which 
comparable statements are possible, could be investigated. However, from the 
examples we give, it is clear that it can be hard to formulate such statements 
for different or broader classes of functions. 

Another interesting direction for subsequent research is to generalize Theo- 
rem n and give explicit - and sharp - estimates of scales after which there are 
only 2 critical points, 3 critical points, etc. left, or to generalize Theorem El in 
such a way that one can restrict the “support of the spatial critical points for 
the function /o-” to different domains than the convex closure of supp(/). 



A Proofs and Demonstrations 



This appendix gives the proofs of both Theorems E and E] as well as the proof 
of the statement in Example 0 We start with a definition of an inner product 
that is used in the proof of Theorem EJ Note that the domain of integration is 
omitted in the remainder of this appendix. One should read for J. 

Definition 3. Given a nonnegative function f , a spatial coordinate x and a scale 
a . Define the inner product (•, -)^ for two functions a : K” — > R and b : K" — *■ K 
as 

} f{y)9a{x-y)dy 

furthermore define a{y) := {a{y), 1)^. 

Proof. (Theorem 0) To proof the theorem, we first derive a system of equations 
that is satisfied by a spatial critical point f G R" at a scale cr. Clearly, for f the 
following n equations hold: 

^ ^ ~ -y)dy = 0, (1) 

for all j S {!,..., n}. Hence, if ^ is a critical point: 



0 J f{y)9cr{^-y)dy 



J yjf{y)9a{^-y)dy, 



and we conclude that £, satisfies ^ where every Fj : R” x R>o 

F is defined as: 



Fj{x; cr) 



Jyjf{y)9cr{x - y)dy ^ _ 
j f{y)9<r{.x-y)dy 



R of 
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Hence, ^ is a fixed point of J^(-; cr). 

F{-- cr) is a contraction if the operator norm of the matrix <P{x) := a))k 

is smaller than 1, i.e., ||^(a;)||, for all x G K" [2|. Writing out the entries a) 

of we get: 

dF- 1 

oxk V ^ 2 ^ 

= - (vf i>x);x = -^{yj - yj^yk - n)^- 

From Equation 0, it follows that <P{x) is a symmetric matrix and so its operator 
norm equals the maximum of the absolute values of the eigenvalues of <P{x). 
Rewrite as follows: 



<P(x) 






j{y - y){y - yffjy) g^jx - y) dy 
I f{y) 9<7{x-y) dy 



and note that is positive semi-definite, because the matrix (y — y){y — yY 
is positive semi-definite and / is nonnegative. From this, it follows that the 
operator norm equals the maximum of the eigenvalues of '!>{x) |2]. Furthermore, 
it holds that the trace of the matrix <l>{x) is greater or equal to this maximum 
eigenvalue. Let A be this eigenvalue, then the following holds: 



A < trace(<l>(a;)) = ^ —{y^ - yj,yj - yj)^ 

"" ( 3 ) 

_ Jjyj - VjYfjy) yAx - y) dy 

I f{y) 9<7 {x - y)dy 

Furthermore, it is easy to verify that / JYjiyj ~ IjY f(y) 9 <t{x - y) dy = J \\y - 
l\Yf{y) 9 <y{x — y) dy attains its minimal value for 7 = ?/ G K". Now, because the 
radius of the convex closure of supp(/) equals r, there is an m G K" for which 
Ijy — m|| < r for y G supp(?/). And so from the last two statements, and Equation 
m we have: 



A < trace(<?(x)) = 



< 



< 



E"=i livi - yjYfjy) gAx - y) dy 

I f{y)9<T{x-y)dy 
lJ2j=i(yj -mjYfjy) ga{x-y) dy 
I f{y) 9a{x - y) dy 
Ir^f{y)9a{x -y)dy 
f(y) 9 a{x-y)dy 



As already stated, the function F{-;<j) is a contraction if A < 1. The func- 
tion F(-',a) being a contraction directly implies that a spatial critical point ^ is 

2 

unique. Hence fa has a unique spatial critical point if ^ < 1. So for all a > r, 
fa has a unique critical point, which is a maximum, because fa is nonnegative 
and lim||2.||^oo fa{x) = 0. □ 
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Proof. (Theorem 13) Let a G K” and c G M, such that for every y G supp(/), we 
have: (y, a) < c (here (•,•) is the standard inner product between two vectors). 
Note that (-, 0 ) < c defines a half-space in K", and so the foregoing states that 
supp(/) is situated in this half-space. 

Now assume ^ to be a critical point of (c is fixed), then the n Equations 
satisfied by a critical point, imply that 



(C, a) J f{y) - y) dy = j {y, a)f{y) - y) dy 

<c j f{y) gdi-y)dy, 



Here it is used that / is nonnegative to obtain the inequality. This inequality 
in turn implies that (^, a) < c. Now, the intersection of all closed half-spaces 
determined by (-,a) < c and containing supp(/), equals the closure C of the 
convex hull of supp(/) (see 0). Furthermore, because for a critical point ^ we 
also have that (^, a) < c for all the former choices of a G R" and c G M, we 
conclude that all critical points reside within the same closure C. □ 



Proof. (Example E) Let (f := J2T=o ■ If this distribution has the property 

of having multiple extrema for all scales larger than a certain scale <j, then the 
function f := (j) * ga has the same property. 

To show that 4> possesses this property, we start by taking the derivative 
of (fa, which equals (f'a{x) = ~ x)ga{x — 2 ^), with cr > 0. 

Firstly, the function (f^ is positive for x smaller than 1. Secondly, for cr > 0, 
(2) is negative. To see this, note that for x = 2, the first summand equals 
2 - 0 /i (20 _ ga{x — 2°) = —ga(X), which is negative. Furthermore, for the other 
summand (k > 1), which are nonnegative, the following holds: 

2-fcM(2'= - 2) ga{2 - 2'=) < 2-‘^'^2^ ga{2^~^) < 2" V(l), 



hence 



^ 2-'='^(2'= - x) ga{x -2^)<Y^ 2- V(l) = 5.(1), 

k—1 k—1 



which shows that the latter statement holds. 

Thirdly, for every a there is an a; > 2 for which (fa{x) is positive. Proof: take 
X = 2P — a {p G Z>o), then the summand of (f'a, where k = p, equals: 2~Pf^ae~^. 

o P — 3 

Now, choose p such that 2p~‘‘ > a and such that 2^~ > a - implying that 
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2cr^ < 2P then the following holds: 

p-i 

- -2P + a) g^{2P -a-2^)< 

p-1 

2-°f^2Pg„{2P-^ + 2P-^ - 2^) < 

fc=0 



p-1 

E2V(2"-") 






< 



(2P-2)2 

p2P e ^ 



<p2Pe-2” 



-2 



Because, e goes to 0 extremely rapidly when p goes to infinity, it follows 
that there is a p G Z>o for which 

p-1 

- Y 2 ”'''' ( 2 '= -2P + a) g^{2P - a -2^) <p2P < 2-PP(je-^. 

k=0 

Hence, taking x = 2 p — a gives > 0. 

Now, from the three foregoing statements, we have that ^^( 0)1 
(j)'„{2P — a) are positive, negative, and positive, respectively. Finally, because 
WtUx^oo 4‘ai'^) is negative, we conclude that for every scale ct > 0, has at 
least three extrema. □ 
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Abstract. We define mutually consistent scale-space theories for scalar 
and vector images. Consistency pertains to the connection between the 
already established scalar theory and that for a suitably defined scalar 
field induced by the proposed vector scale-space. We show that one is 
compelled to reject the Gaussian scale-space paradigm in certain cases 
when scalar and vector fields are mutually dependent. 

Subsequently we investigate the behaviour of critical points of a vector- 
valued scale-space image — i. e. points at which the vector field vanishes — 
as well as their singularities and unfoldings in linear scale-space. 



1 Introduction 



It has been argued by Koenderink |B| that a Cf (]R"'x]R''') functioiJil u(x; s ) — with 
scale parameter s G — is a reasonable choice for a multiscale representation 

of a scalar image /(x) if, at the location of spatial extrema, 

UgAu > 0 . 

Koenderink proposed to take the simplest instance of such a representation, 
which led him to consider the linear heat equatioifl, 

dsU = Au , 

limu=/, (1) 

^ A function u : IR^'xlR"*’ — *■ IR is in Cf (IR’^xIR’'") if it is twice continuously differentiable 
with respect to x £ IR" and once with respect to s £ IR'*’. If u ^ Cf (IR" x IR'*') then 
there are other possibilities, cf. Van den Boomgaard et al 00 . 

^ Note that Au^O at a spatial extremum. 
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as the generating equation for a scale-space representation. Thus the argument 
pertains to the behaviour of extrema, and, in the case of the heat equation, 
basically boils down to the (strong) maximum principl^ 

In this article we wish to generalise the scale-space paradigm to vector-valued 
images. A nontrivial demand to be reckoned with is mutual consistency: If the 
vector field implies the existence of a scalar field, as is for instance the case in a 
vector space endowed with a scalar product, then a multiscale representation of 
the latter must be consistent with the one induced by a multiscale representation 
of the former. We show that this implies that one is compelled to reject the 
Gaussian scale-space paradigm in certain cases. 

Subsequently we concentrate on linear vector scale-space representations and 
investigate the behaviour of its critical points — i.e. points at which the vector 
field vanishes — as well as their singularities and unfoldings in scale-space. 



2 Theory 

2.1 Vector Scale-Space 



Once one appreciates that the addition of first order terms on the right hand 
side of Eq. m does not violate Koenderink’s causality principle, it becomes 
relatively straightforward to generalise the causality argument to vector images 
in such a way that consistency is manifest. That is to say, the scale-space of an 
arbitrary scalar field generated by the vector field {e.g. its magnitude) should 
be compatible with the already established theory for scalar images. In order to 
appreciate why first order terms are sometimes necessary, consider the following 
example. 



Example 1. Suppose we would require the magnitude of a suitably defined mul- 
tiscale vector-valued image to satisfy the linear Eq. dU, i.e. without a first order 
nonlinearity, then we are confronted with a dilemma: As the norm of the high- 
resolution vector image (at scale s = 0, say) is a positive function^ with isolated 
zeros only, its finite-scale representation must be everywhere strictly positive as 
a consequence of the maximum principle. Thus wherever defined, the scale-space 
representation of the vector field itself can have no critical points — spatial loci 
at which the vector field’s components vanish for fixed scale s > 0 — not even at 

® It should be stressed that this does not imply that extrema cannot be created as 
scale increases, as is sometimes claimed in the literature, cf. Simmons et al. cn. 
Damon |2) has shown that such creations do in fact occur generically, although they 
are typically outnumbered by annihilations. 

^ A function / is called positive if f{x)>0 for almost all xGIR” and Jf{x)dx^0. 
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“infinitesimal” scales. Elsewhere 0 it is explained that if the components v and 
w satisfy the following coupled system of p.d.e.’s 



I 



dsV = Av ^ 7t(wVv — vVw) ■ Vw 

_1_ 



dsW = Aw -{v'S/w — wVv) -Vv. 

71^ _1_ 7/1^ 



+ w 



then u = \/v'^ + w'^ indeed satisfies the linear Eq. 0 . The problem alluded to 
above is reflected in the denominator occuring on the right hand sides. In fact, 



The scaling behaviour of the vector field’s critical points outlined in the example 
is a highly undesirable situation, and so we must reject Eq. (d) as a viable scale- 
space representation for the norm of a vector field (or any other scalar induced 
by the vector field for that matter). Critical points of a vector field should be 
able to survive a range of physical scales. The Poincare-Hopf theorem HH bears 
witness to the fact that in spite of their zero measure they are not something 
“infinitesimal” . 

The way out of the dilemma is of course to relax the constraint on scalars induced 
by the vector field. None of these should be subjected to the linear heat equation 
for reasons amply discussed. Indeed, insisting on Koenderink’s scale causality 
demand does not compel us to assume the existence of any scalar satisfying 
Eq. (d)j notably linearity, as this is only a special case of a more general class. 
Only if the scalar image is not some quantity derived from another physical 
observable, i.c. a vector field, the simplifying assumption of linearity may be 
justified, as there is no additional consistency demand to be met in that case. 

Vice versa, if we are given a scalar image and (thus) adopt Eq. as our 
paradigm, the equations for the induced vector field v = Vu are obviously linear, 
too: 



in which vq is the high resolution gradient image. We may subsequently adopt 
Eq. (PI) as the paradigm for vector fields in general, i.e. beyond the class of 
gradient fields. Apart from being the most straightforward thing to do in the 
first place, it is easy to verify that this is indeed consistent with the scalar 
theory, provided we refrain from the linearity condition for vector-induced scalar 
fields. Indeed, any function of the magnitude p = -^/v • v satisfies an admissible 
semilinear diffusion equation. Let us consider some examples. 



these equations do not admit an initial condition with critical points. 




( 2 ) 



Example 2. In two dimensions we may introduce a complex scalar field, u = 
v-\-iwG^ (respectively f = vo~\-iwo), where {v,w) are the Cartesian components 
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of V. The form of Eq. (|2I) then formally reduces to that Eq. (P). If we write 



The first order terms in the equation for g prevent the magnitude from becoming 
instantly everywhere positive, as opposed to the case in which they are absent. 
Thus critical points in the initial condition (go(xc) = 0) are able to survive a 
finite amount of blur, as it should. Note also that the weighted angle ip = gcj) 
satisfies an equation of the same form as that of g. Using this function instead 
of 4> one may thus circumvent the singularity g = 0 (implying (p is ill-defined but 
ip = 0) on the right hand side. 

We have seen that linear equations for the vector image imply nonlinear equa- 
tions for induced scalar images. The example of a scalar image and its gradient 
illustrates the possibility of linearity in both domains. The next example, fi- 
nally, shows that it may also be necessary to add nonlinear terms to the defining 
equations for a multiscale vector field derived from a linear scalar image. 

Example 3. Elsewhere mil a theory has been proposed for multiscale motion 
extraction consistent with the scale-space paradigm for the underlying scalar 
image u(x; s). It has been noted that the operationally defined multiscale motion 
field v(x; s) does not satisfy Eq. (0, but that the flux field j(x; s) =rt(x; s)v(x; s) 
by construction does. From the fact that u and j satisfy the linear diffusion 
equation it follows that the motion field itself satisfies 



Here the occurence of u in the denominator poses no problem since it is mani- 
festly positive at scales s>0, cf. Eq. Q. 

All the examples given indicate that one must be cautious about which scalar or 
vector field one should subject to linear equations, Eq. © respectively Eq. o, 
at least in cases where mutual dependencies exist (vector field versus magnitude 
field, scalar field versus motion field, et cetera). Otherwise a linear equation is 
the natural choice a priori. 



u=g then we have 




Q 

lini(e,0) = (po,</>o) ■ 

s^O 
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2.2 Behaviour of Critical Points 



Next we study the behaviour of critical points given Eq. 0. Recall that a critical 
point is defined as the spatial locus Xc at which v(xc; s) = 0 for any fixed s. 

In the generic case, assuming v(x; s) is a Morse function for some given scale 
s, the critical points are all isolated. As scale increases each such point will 
move along a critical path. The situation is similar to that of critical points 
in a scalar image — in that context defined as spatial loci at which the image 
gradient vanishes — and can be analysed in a similar fashion PIE]. The situation 
for non-gradient fields is slightly more complicated, however. 



Theorem 1. Let 3 be the Jacobian matrix with components = diV^, with 
row and column indices i, n = l,...,n, respectively. Furthermore, let 3 be its 
associated cofactor matrix, with components vf — in which upper indices are now 
row indices — and det J the Jacobian determinant. A tangent to a parametrised 
critical path T : IR ^ IR" x IR'’’ : A i— > (x(A); s(A)) is then given by 

/kA ^ ^Av\ 

[sj VdetJy/ 

in which a dot indicates differentiation with respect to X. It is understood that 
the right hand side is evaluated at the location of the critical point. 



Note that Z\v = dgV. The definition of the cofactor matrix is reviewed in Ap- 
pendix El The essential property is J J = det JI. 

Proof. The derivation is essentially the same as presented elsewhere for critical 
paths in a scalar image jSj. The theorem is easily verified by noticing that the 
tangent must satisfy the n equations 

Jx + Avs = 0 , 

which follows by inspection of the first order Taylor terms of v at the location 
of a spatial critical point. Insertion of the right hand side in Theorem Q], using 
JJ = det JI, readily shows that it indeed satisfies this constraint. 

It follows from Theorem E that as long as the Jacobian does not degenerate the 
critical path intersects the planes of constant scale transversally. However, as 
soon as det J = 0, the critical path becomes horizontal, and the critical point 
changes its character as it “reverses in scale”. This indicates either an annihi- 
lation or a creation — depending on the sign of curvature of the critical path at 
the singularity — of a pair of critical points with opposite Poincare indices. 

Theorem E applies to arbitrary dimensions. Let us scrutinise the 2-dimensional 
situation for simplicity. In that case the explicit form of the scale-space tangent 
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to the critical path becomes 

( x\ / dyvAw — Avdyw\ 

y = AvdxW - dxvAw . 
s) \pxvdyW - dyvdxwj 

Note that the right hand side is just the outer product x= Vux Vw, in which 
V is the scale-space gradient with components (dx,dy,ds), and in which v and 
w are the Cartesian components of the vector field. 

Let us now consider the classification of possible critical points. The eigenvalue 
equation for J is 

— tr J A -I- det J = 0 , 

with tr J = divv. Analysis then shows that we may distinguish the following 
cases (it is understood that v = 0 in the points of interest): 



1. If tr < 4det J then Ai, A 2 S C\IR, Ai = A 2 , say Xi ,2 = XHfJ- with A, /r G IR 

and ^i^0. 

2. If tr^J = 4det J then Ai, A 2 G K, Ai = A 2 = A yf 0, so we have a symmetric 
nodal point: 

2a. If A < 0 we have a symmetric sink. 

2b. If A>0 we have a symmetric source. 

3. If tr ^ J > 4 det J then Ai , A 2 G K, Ai A 2 . 



The first and last case can be further subdivided as follows. 



la. If A < 0 we have a spiral sink. 

lb. If A = 0 we have a central point. 

l c. If A>0 we have a spiral source. 

3a. If det J ^0 then sign Ai — — signA 2 , so that we have a saddle point. 

3b. If det J = 0 then Ai =0, A 2 = divv or vice versa, which corresponds to a 
generic degeneracy. 

3c. If, finally, det J > 0 then signAi =signA 2 , so that we have a nodal point. 
Still a further subdivision can be made (note that sign Ai _2 = sign divv): 
3cl. If Ai <0 we have an asymmetric sink. 

3c2. If Ai > 0 we have an asymmetric source. 



Figure Q shows all types of nondegenerate critical points. (All figures may be 
arbitrarily rotated.) 

The nature of the generic degeneracy (case 3b) is particularly interesting, for it is 
precisely at such a point in scale-space that the vector field changes topologically. 
In two dimensions the vanishing of one eigenvalue of the Jacobian matrix along 
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Fig. 1. Nondegenerate Critical Points. Upper row, from left to right: spiral sink, 
centre point, spiral source. Lower row: saddle, (asymmetric) sink, (asymmetric) 
source. 



a critical path implies that the second eigenvalue must be real at the singularity, 
for if Ai gC\]R then A 2 = A*. Moreover, in the generic case under consideration, 
A 2 does not vanish at the same time as Ai does. This follows from the fact 
that if Ai = 0 then A 2 = Ai -I- A 2 = tr J = div v. In the generic case the common 
points of the scale-space surface det J = 0 and the curve v = 0 are isolated. 
The probability that such a point lies within the surface divv = 0 is zero. This 
admits only one possible type of event, viz. a saddle point colliding with (i.e. 
annihilating with or emerging from) a nodal point, i.e. a sink or source node. 
Whether we have an annihilation or a creation depends on the (sign of the) 
curvature of the critical path at the singularity (y.i.). Figure 0 shows the vector 
field in the neighbourhood of a degenerate critical point. 




Fig. 2. Generic Degeneracy (Unperturbed). 



The above analysis implies that a spiral node (case la or Ic.) is stable in the 
sense that it can only cease to exist by first transforming into a critical point of 
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type 3cl, respectively 3c2, via a corresponding symmetric nodal point, i.e. case 
2, after which an annihilation with a saddle point may occur. This sequence of 
events corresponds to a pair of complex conjugate eigenvalues approaching the 
real axis and scattering off in horizontal direction after collision. As soon as one 
of the eigenvalues reaches 0 we get an event of type 3b. Vice versa, spiral nodes 
cannot be created spontaneously; the abovesketched sequence must be traversed 
in opposite direction, starting out from a creation of a saddle and a nodal point, 
whereby the latter turns into a spiral node via an intermediate symmetric nodal 
point. See Figure 0 




Fig. 3. Mutation diagram of (the eigenvalues associated with) a critical point in 
the (D-plane, showing a spiral node (nonreal complex conjugate points approach- 
ing the M-axis) transforming into an asymmetric nodal point (pair of distinct 
real points moving along the M-axis) via a symmetric nodal point (scatterpoint 
on the M-axis). As soon as one of the scattered points reaches the origin we ob- 
tain a degeneracy. The velocities of corresponding points are mirror-symmetric 
relative to the real axis. 



Theorem Q implies that at the scale-space location of a catastrophe at which a 
saddle and a nodal point collide the tangent to the critical path is horizontal 
(s = 0). In order to distinguish between creation and annihilation events involving 
a saddle and a nodal point we need a higher order local analysis. 



Example 4- Consider the vector field germs, together with their perturbations. 



Va(a;,?/,s) = 



X* -I- 2s 

0 



and Vc(x,y, s) = 



— 2?/^ — 2s' 
— 4xy 



The second term on each right hand side is the canonical form of a typical 
perturbation on a full scale-space neighbourhood of the fiducial origin, (x, y, s) = 
(0, 0, 0). The field Va(x, y, s) captures an annihilation, Vc(x, y, s) a creation event 
at the origin. The critical paths are given by Ta : (x, y, s) = (±V— 2s, 0, s), s < 
0, and Ec : (x,y,s) = (±v^, 0,s), s > 0, respectively. Figure 0 illustrates the 
behaviour of these fields near the origin. 



These germs and their perturbations are analogous to those describing the un- 
foldings of corresponding annihilation and creation events in scalar images, cf. 
Damon P|. 
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Fig. 4. Unfolding of Generic Singularities. Upper row: annihilation event. Lower 
row: creation event. Scale (resolution) increases (decreases) from left to right. 



Critical points of a vector field all coincide with global minima of the vector 
field’s magnitude, but the latter generally has additional critical points (maxima, 
saddles and local minima), viz. those points where v-Vv = 2J'^v = 0 and vyfO, 
i.e. where the vector field itself happens to be nonvanishing and orthogonal to 
its gradient (note that this implies degeneracy of the Jacobian). 

Inspection of the gradients and Hessian matrices of the scalar images ita,c(a:, y, s) 
= llva,c( 2 ;, 2 /, s)|p as a function of scale reveals that at a singularity of the vec- 
tor field there is an intersection of critical paths with the shape of a pitchfork 
pointing downward in the annihilation event and upward in the creation event. 
In the first case a saddle collides with two global minima (the critical points of 
Va) at (x,y,s) = (0,0,0), after which a single local minimum emerges. In the 
second case a local minimum spawns two global minima (the critical points of 
Vc) at {x, y, s) = (0, 0, 0), together with a saddle. In both cases the Poincare in- 
dex of a spatial region containing (only) the critical points under consideration 
remains invariant, as it should: Figure 0 (In this argument “local minima” may 
be replaced by “local maxima” depending on the case of interest.) 

The critical points of the vector field’s magnitude that are not critical points of 
the vector field itself define critical paths in scale-space implicitly defined by 



q(x; s) = v(x; s) • Vv(x; s) = 0 . 
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global min 



local min/max 
global min 



global min 



global min 
local min/max 



Fig. 5. Unfolding of the generic singularities of vector field’s magnitude that 
correspond to those of the vector field itself. Left: annihilation event. Right: 
creation event. Scale (resolution) increases (decreases) upward. The singularity 
always involves two global minima as well as a saddle/local minimum/maximum 
pair. 



In the two-dimensional case at hand we have two equations, one for each compo- 
nent of v= (v, w), and three unknowns, (x; s) = (x, y, s). Consequently we again 
expect to find critical paths in scale-space, which can be analysed as previously. 
If we expand the defining equation to second order we obtain 

q(x; s) = q-fVq-x-f-x"''- VV'^q • x -|- 9gq s -|- h.o.t. 

in which the coefficients on the right hand side are partial derivatives evaluated 
at the origin. Setting the left hand side equal to zero, and defining the matrix 



^ def _ 

Q = Vq, 



( 3 ) 



with components qij = di{v ■ djv), it follows that a scale-space tangent to the 
critical path is given by the following theorem, the derivation of which is com- 
pletely analogous to that of Theorem 



Theorem 2. A tangent to a parametrised critical path T : IR — > IR" x 

(x(A); s(A)) through a local critical point of the magnitude field u(x; s) = ||v(x; s)|| 

is given by 

(A ^ /^-Q^sq\ 

\sj det Q J 

in which a dot indicates differentiation with respect to the curve parameter X, 
and in which Q is the matrix as defined in Eq. 0- It is understood that the 
right hand side is evaluated at the location of the critical point. 



Note that 9gq=Z\v • Vv-|-v • VZ\v. 

The critical path is transversal to s-planes if and only if det Q yf 0, in which 
case we may use A = s itself as a valid curve parameter. If det Q = 0 the curve 
becomes horizontal. Assuming that vyfQ at that singularity (the other case has 
been discussed), we have either an annihilation or a creation of local critical 
points, one of which is a saddle, the other a maximum or local minimum. The 
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criterion is again the sign of the critical path’s curvature at the singularity. This 
case falls within the scope of Damon’s analysis 0- See Figure El 




Fig. 6. Unfolding of the generic singularities of vector field’s magnitude that do 
not involve critical points of the vector field itself. Left: annihilation event. Right: 
creation event. Scale (resolution) increases (decreases) upward. The singularity 
always involves a saddle/local minimum/maximum pair. The situation is similar 
to that encountered in scalar images. 



A Cofactor Matrix 



Let A be a square n x n matrix with components Then we define the 
transposed cofactor matrix A as follows. In order to obtain the matrix entry 
skip the ^-th column and v-i\i row of A, evaluate the determinant of the 
resulting submatrix, and multiply by (—1)^+'^ (“checkerboard pattern”). Using 
tensor index notation, 



ctef 



1 

(n- 1)! 



UVi...Vn-l 



'■lllVl 



...A 



flri-lfn-l 1 



in which gMi -Mn jg spatial Levi-Civita tensor, defined as the normalised, 
completely antisymmetric symbol with = 



An important property of the cofactor matrix is 

A A = A A = det A I , 

in which I is the nxn identity matrix. Thus if a matrix is invertible, its cofac- 
tor matrix equals its inverse times its determinant. Unlike the inverse matrix, 
however, the cofactor matrix is always well-defined, because its coefficients are 
homogeneous polynomials of degree n—1 relative to the coefficients of the original 
matrix. 



From the definition it follows that 1 = 1. Moreover, r = r for any number rGiR, 
and if B = A then A = B . 



The following lemma has been used in the foregoing text. 
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Lemma 1. For any matrix A we have 

V det A = tr (AVA) . 



Proof. For invertible matrices the proof goes as follows. Write det A = exp tr In A. 
Taking the gradient yields V det A = V(tr In A) exp(tr In A) = tr ( V In A) det A 
= tr(A“'' VA) det A = tr(AVA). If A is not invertible, consider the regu- 
larised, invertible matrix Ae = A-|-eI instead and apply the theorem. Left and 
right hand sides are polynomials in e; taking the limit £ —> 0 establishes the 
result. 
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Abstract. Gaussian convolutions are perhaps the most often used im- 
age operators in low-level computer vision tasks. Surprisingly though, 
there are precious few articles that describe efficient and accurate imple- 
mentations of these operators. 

In this paper we describe numerical approximations of Gaussian convo- 
lutions based on interpolation. We start with the continuous convolution 
integral and use an interpolation technique to approximate the continu- 
ous image / from its sampled version F. 

Based on the interpolation a numerical approximation of the continuous 
convolution integral that can be calculated as a discrete convolution sum 
is obtained. The discrete convolution kernel is not equal to the sampled 
version of the continuous convolution kernel. Instead the convolution of 
the continuous kernel and the interpolation kernel has to be sampled to 
serve as the discrete convolution kernel . 

Some preliminary experiments are shown based on zero order (nearest 
neighbor) interpolation, first order (linear) interpolation, third order (cu- 
bic) interpolations and sine-interpolation. These experiments show that 
the proposed algorithm is more accurate for small scales, especially for 
Gaussian derivative convolutions when compared to the classical way of 
discretizing the Gaussian convolution. 



1 Introduction 

Gaussian convolutions are perhaps the most often used image operators in low- 
level computer vision tasks. Surprisingly though, there are precious few articles 
that describe efficient and accurate implementations of these operators. 

Florack j2j recently published a paper comparing spatial sampling of the 
Gaussian convolution kernel with frequency sampling of the Gaussian convolu- 
tion kernel. His findings are in accordance with the results of this paper. Florack 
heavily relied on the frequency domain analysis of the convolution operators, 
whereas in this paper the frequency domain analysis is not needed. Furthermore 
the analysis by Florack is restricted to the Gaussian convolution and does not 
include Gaussian derivative convolutions. 

Efficient recursive approximation algorithms for Gaussian (derivative) convo- 
lutions are developed by Deriche P and Young and Van Vliet P . Although very 
fast they lack accuracy especially at small scales and for Gaussian derivatives. 

M. Kerckhove (Ed.): Scale-Space 2001, LNCS 2106, pp. 205-121^ 2001. 

© Springer- Verlag and lEEE/CS 2001 
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Another related approach is introduced by Lindeberg 0 who did not consider 
the task of discretizing the Gaussian convolution but instead opts for discretiza- 
tion of the diffusion equation. Again for large scales the Lindeberg approach is 
(almost) equal to the classical Gaussian convolution. 

In this paper we describe numerical approximations of convolutions based 
on interpolation. We start with the continuous convolution integral and use an 
interpolation technique to approximate the continuous image / from its sam- 
pled version F . Based on the interpolation a numerical approximation of the 
continuous convolution integral is obtained that can be calculated as a discrete 
convolution sum. The discrete convolution kernel is in general not equal to the 
sampled version of the continuous convolution kernel. It proves to be the sam- 
pled version of the convolution of the continuous convolution kernel and the 
continuous interpolation kernel. 

Some preliminary experiments are shown for Gaussian (derivative) convolu- 
tions based on several types of interpolation. The proposed algorithm is more 
accurate for small scales, especially for Gaussian derivative convolutions, com- 
pared to the classical approach of sampling the convolution kernel. 

2 Sampling and Interpolation 

An image / is a mapping from the continuous spatial domain to the real 
numbers K (in this paper we only consider scalar images) . The image / thus gives 
the value /(x) for each location x as if we would have placed the observation 
probe at location x. 

In practice all that can be done is to sample the spatial domain in a finite 
number of locations. We assume the sampling grid is generated by the basis 
B — (bi, . . . ,bd) leading to the observations f{Bk) for all integer multi-indices 

k e 

Definition 1 (Sampling). Let f G Fun(R'^,]R) be an image and let B represent 
the sampling grid basis, then we define the sampling operator Sb ■ Fun(K‘^,]R) — > 
Fun(Z‘^,R) by: 

(5s/)(k) = f(Bk) 

for alike Z^. 

In this report only the standard orthonormal sampling grid is considered, i.e. 
B — I, the identity matrix. Nevertheless we prefer to write Bk, to stress the 
fact that writing Bk serves as a transition from k G Z‘^ (the discrete domain) to 
Bk G (the continuous domain). 

The practical necessity to sample an image in only a small part of the entire 
space is not taken into account. Instead we consider infinite spatial domains; a 
simplification that does not greatly influence the results in this report. 

Given a sampled version F = 5s/ of an image / stored in computer memory, 
the goal is to process the image. In principle we are not interested in processing 
its representation F ; we would like to process / and then sample the result again 
in order to store the result in computer memory as well. This approach thus 
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concentrates on numerical approximations of the continuous operator instead 
of concentrating on sampling the images and applying a discrete operator. In 
section 13 we look at numerical approximations of continuous convolutions. 

In order to numerically approximate the continuous convolution we need to 
be able to (approximately) reconstruct / given its sampled representation F. 
In this paper the restriction to linear and translation invariant interpolation 
schemes is made. 

Definition 2 (Interpolation). Let F G Fun(Z‘^,R) be a sampled version SbI 
of some eontinuous image f G Fun(M‘^,M), then we define the interpolation 



where 4> G Fun(R‘^,R) is called the interpolation kernel. 

Note that can only be truly called an interpolation in case Sb^<i>,b = id. 
This implies that for the interpolation kernel we have </'(B(k — 1)) = c5k,i where 
<5k,i = 1 iff k = 1 and Jk.i = 0 otherwise. 

This requirement on interpolation means that interpolation followed by sam- 
pling should result in the original set of sample values (observe that in this case 
the identity operator id is in Fun(Z‘^,M)). On the other hand, a sampling fol- 
lowed by an interpolation, i.e. T^^bSb need not (and in general will not) result 
in the original function, i.e. T^^bSb ^ id (now the identity is in Fun(R‘^,R)). 




-2 0 2 -2 0 2 -2 0 2 -2 0 2 



Fig. 1. Interpolation Kernels. Shown are 4 one-dimensional interpolation 
kernels for the unit sampling grid. The 0-order kernel (/>o is 1 in the interval 
[—0.5, 0.5] and 0 outside. The first order kernel 4>i achieves a linear interpolation 
between the samples, the kernel is a third order interpolating kernel and 
is the well-known sine interpolation kernel (j)co{x) = s\ti{'kx) / (ttx). 

Well-known interpolation schemes like nearest neighbor interpolation, bilin- 
ear interpolation, bicubic interpolation and sine interpolation all fit within this 
framework. These are all examples of polynomial interpolators (the subscript k 
in 4>k denotes the order of the interpolating polynomial). 

The interpolation kernels depicted in Fig. Q are defined as: 



(X,,bF)(x) = ^ 0(x - Bk)F(k) 



( 1 ) 



keZ'^ 




208 



Rein van den Boomgaard and Rik van der Weij 






1 — |a;| : |a;| < 1 

0 : elsewhere ’ 



( 1 - ||a;p + ||a;p : 0 < |a;| < 1 

(j)3{x) = S 2 - 4:\x\ + ||j;p - i|a;p : 1 < |a;i < 2 
I 0 : elsewhere 



and 



irx 

Note that (f >3 is only an example of a third order interpolation kernels. 

The important property to note here is that all these interpolating schemes 
calculate the interpolated value as a linear combination of the sample values in 
the discrete representation of the image. For a comprehensive overview of linear 
interpolation techniques in (medical) image processing we refer to MeijeringjSj. 



3 Discrete Approximations of Continuous Convolutions 

In this paper we will make a explicit distinction between a continuous convolution 
integral and its discrete implementation. Therefore we give the formal definition 
of both the convolution integral and the discrete convolution sum. 

Definition 3 (Continuous Convolution Integral). The convolution of a 
continuous image f G Fun(R‘^,]R) with a kernel w G Fun(M‘^,M) resulting in 
an image f *w G Fun(R‘^,R) is defined by: 

{f*w){x)=[ /(x - y)w(y)dy. (2) 

In this definition (and throughout this paper) we assume that the functions 
involved in a convolution are defined in such a way that the convolution is well- 
defined. 

Because only the discrete representation F of the image / is available we have 
to resort to an approximation of the convolution integral. The classical way to 
approximate the sampled version of / * w in Eq. is to sample both the image 
/ (resulting in F — Ssf) and the convolution kernel w (resulting in IF = Sbw) 
and then calculate a discrete convolution sum. 

Definition 4 (Discrete Convolution Sum). The convolution of a discrete 
image F G Fun(Z‘^,M) with a kernel W G Fun(Z‘^,R) resulting in an image 
Fi^W G Fun(Z‘^,M) is defined by: 

(F*IF)(k)= ^F(k-1)VF(1). (3) 

1GZ<^ 

The discrete image F-kW is, in general, not the sampled version of the continuous 
convolution f * w, it is only a sampled approximation: 



Ssif * w) « Ssf *Sbw. 
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It is important to note that the above approximation for 5 b (/ * w) is just one of 
the possible approximations that will be presented in this paper. In the computer 
vision literature it is almost always the only approximation that is presented (and 
often without argumentation). In section 0| it will be argued that sampling both 
the image and the kernel is exact in case both the image / and the kernel w 
are band-limited. Sampling the kernel is most often not needed as it is known in 
analytical form (like the Gaussian kernel that we are interested in). 

A second classical approach to approximate Sb (/ * w) is to calculate the con- 
volution integral in the — sampled — frequency domain (using a discrete Fourier 
transform). The main difference with the first approach is that sampling is not 
done in the spatial domain but in the frequency domain. For kernels that are 
poorly sampled in the spatial domain (because they are not band-limited) this 
may be advantageous (see FlorackP) and Oppenheimer et. al.|^). 

In this paper we propose not to sample the kernel w directly. Instead an 
interpolation is used to approximate the continuous image function / from its 
sampled representation F. The continuous convolution integral then becomes a 
sum of integrals. The integrals can be calculated analytically, whereas the sum 
turns out to be a discrete convolution. 

Proposition 1 (Prom Continuous to Discrete Convolution). The contin- 
uous convolution f *w is approximated hy the discrete convolution F * W^j, where 
F is the sampling of f. The discrete kernel is the sampling of w*<f>, where 4> 
is the interpolation kernel used to approximate f from its sampled representation 
F. I.e. 

Snif *w) ~ Suf *Sb(w * 4>) = F'kW^. 

Proof. Because only the samples F = Sf are known, we have to approximate 
/ using an interpolation I^^bF to obtain an approximation of the convolution 
integral: 

f *w K. I^^bF * w. 

Let Ty be the translation operator over the vector y of an image in Fun(R‘^, R), 
then we can write: 

Ffk)TBkCl>. 

keZ'i 

Substituting this into the above approximation of the convolution and using the 
fact that convolution is linear and translation invariant we get: 

f *w ^ ^ F{k)TBu4' * w 

\kGZ>^ / 

= ^ F{k)TBk{w * (p). 
kGZ>^ 

Sampling this function in Fun(K‘^,]R) leads to 

5b(/ * w) ~ ^ F(k)SB {tbw{w * (p)) 
kez-i 
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It can be easily shown that sampling commutes with translation (over a grid 
vector) such that SbTb\l = T\^Sb, where Tk is the translation operator in discrete 
space. This leads to: 

SB{f *w) ^ F(k)Ti^SB{w * (j)) 

keZ‘* 

i.e. the discrete convolution of F = SbJ and = Sb{w * (j)): 

5b (/ *w) Ki SbJ Sb{w * (j>) = F -kWff,. 

QED. 

The analysis in this section thus showed that 

— A continuous convolution integral f * w is approximated with a discrete 

convolution sum F * . 

— The discrete kernel is, in general, not equal to the sampled version of 
the continuous kernel: it should be chosen to be the sampled version of the 
continuous kernel w convoluted with the interpolating kernel 4>, i.e. W(f, = 
Sb{w * (f). 

— The discrete approximation of a continuous convolution integral is tightly 
coupled with interpolation. The convolution integral can be approximated 
in any (subpixel) position x G 

~ The analysis is not dependent on the choice of the kernel w (assuming the in- 
tegral is well defined). Therefore the entire A^-jet using Gaussian derivatives 
can be approximated at all scales at all (subpixel) positions. 

4 Band-Limited Functions 

The concept of band-limited functions rooted in the frequency domain analysis 
of signals and (linear) systems is so familiar to anyone working with sampled 
functions, that an analysis from this point of view on the convolution approxi- 
mations discussed in the previous section is bound to be a fruitful exercise. 

It has been shown in this paper that a discrete approximation of the convo- 
lution f *w is obtained as a discrete convolution F -kSB{w * (j)) where F — SbJ 
is the sampled version of / and Sb{w * 4>) is the sampled version of w * 4>. 

In this section we assume that the sampling grid is generated by the standard 
orthonormal basis B = I, i.e. we sample / in the integer valued points: 

F(k) = f{Bk) = /(k). 

For a properly sampled band-limited function / the interpolation using the sine 
function (j)oo is exact and thus the convolution integral can be calculated and 
sampled without error. 

Proposition 2 (Convolution of Band-Limited Function). Let f be a band- 
limited function and let w be any kernel, then: 

SB{f*w) = SBf *Sb{w * (pao) 
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The proof of this proposition follows from the observation that in our approxima- 
tion of the convolution integral in the previous section, the only approximation 
is of the function / itself (through the interpolation). For a properly sampled 
band-limited function, sine-interpolation is exact and thus the convolution is 
exact . 

The above proposition is true for any kernel w, even for kernels w that are 
not hand-limited. In fact through the convolution w * 4>ao we assure that the 
function to be sampled is band-limited. 

Proposition 3 (Convolution of a Band-Limited Function with a Band- 
Limited Kernel). Let f he a hand-limited function and let w he a hand-limited 
kernel, then: 

Ssif *w) = Ssf *Sbw 

Again the proof is trivial because for a band-limited function w we have that 
w * 4>oo = w. In this case convolution and sampling commute (be it that a 
continuous convolution is replaced by a discrete convolution) . 

In the practical use of scale-space techniques we often find that neither the 
image nor the convolution kernel (e.g. the Gaussian function at small scale) are 
band- limited. The results of the previous section then still provide a numerical 
sound way to approximate the convolution, even at small scales and in subpixel 
positions. 



5 Separable Convolution Kernels 

One of the practical advantages of using the Gaussian functions (and its deriva- 
tives) in computer vision is that the Gaussian function is separable. For a sep- 
arable function w G Fun(K‘^,]R) we can find functions Wi G Fun(M, R) such that 
w(x) = Wi(xi)w 2 (x 2 ) • • • Wd(xd). The practical relevance is that the convolution 
of an image / with a separable kernel iv is equivalent to the composition of d 
convolutions each using a ‘one dimensional’ convolution kernel. 

Let w be a separable convolution kernel. The discrete convolution using kernel 
Sb{w * 4>) is separable as well in case the interpolation kernel <j) is also separable. 

In this paper the restriction to the discrete convolutions needed in Gaussian 
scale-space theory is made. All these convolutions using Gaussian kernels and 
derivatives of the Gaussian function are separable. 

The interpolation methods that we will use are separable as well. Therefore in 
the remaining section of this paper we only look at one dimensional convolutions. 



6 Gaussian Convolutions Based on Interpolation 

No scale-space paper is complete without the definition of the Gaussian function. 
We only need the one dimensional Gaussian function: 
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In scale-space theory and practice, convolutions using Gaussian derivatives are 
just as important. The formalism developed in previous sections is valid for any 
kernel. Here we consider the n-th order derivative of the one dimensional 

Gaussian function. 

The interpolation schemes (po, pi, ps and poo that are considered in this 
section are defined in section 0 see also Fig.0 






O 

X 

□ 

☆ 

□ 



^ 2 ^ 0.5 
^ 9 2 0 5 

Sg(d^g°-® 



> 1 ) 



Fig. 2. Discrete Gaussian Convolution Kernels. Shown are the discrete 
kernels based on (normalized) sampling, zero order, first order interpolation, 
third order interpolation and sine-interpolation. The scale of the Gaussian kernel 
is 0.5 (sampling grid distance). 



The discrete kernels 5 _b(i9”(7® * pk) for n = 0, 1, 2 and k = 0, 1, 3, oo at scale 
s = 0.5, are depicted in Fig. 0 The convolutions * pk are numerically 
approximated (with matlab). 

For comparison we also give the sampled kernel S{d^g‘^). We use the S no- 
tation to indicate that the kernel is normalized (such that 5s (g®) sums up to 
one). 
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Fig. 3. Gaussian Convolution Approximation. Shown are the discrete con- 
volutions approximating the Gaussian (derivative) convolution based on (nor- 
malized) sampling, zero order, first order interpolation, third order interpolation 
and sine-interpolation. The scale of the Gaussian kernel is 0.5 (sampling grid 
distance) . 



As a test function we define: 

f{x) = sin(y)cos(y). 

For this function the convolutions / * d'^g^ van be calculated analytically (us- 
ing Mathematica) . This allows us to compare the discrete convolution 5 b/* 
Ssid^g^ * <Pk) with the true value 5 b(/ * 9"g®). The discrete convolutions using 
a Gaussian kernel at scale s = 0.5 are depicted in Fig. 0 

For the function / the sine interpolation based discrete convolution is most 
accurate (deviations from the true value are probably due to truncated support 
of the sine kernel and due to the numerical approximations of the convolution 
integrals that are used. 

From Fig. 0it can also be concluded that for small scales, any interpolation 
is better then the classical sampling scheme. For larger scales this is not true 
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anymore. Then * (j^infty « and thus sine interpolation is equivalent 
with the classical sampling scheme. The other (simpler) interpolation schemes 
then introduce an unwanted smoothing effect (although the inffuence on the 
numerical approximations is negligible). 

7 Conclusions 

In this paper we presented a simple scheme for approximating Gaussian convo- 
lutions that outperforms the classical spatial convolutions (based on sampling 
the continuous convolution kernel). Especially at small scales and for Gaussian 
derivatives the performance (in terms of accuracy) is much better. 

The presented formalism is tightly connected with interpolation. Sub-pixel 
accurate estimates of the Gaussian derivatives at small scales are therefore easily 
obtained. 

The presented preliminary experiments are based on several types of (lin- 
ear) interpolation. Future work will consider other more advanced interpolation 
methods as well. Furthermore we plan to compare our spatial sampling imple- 
mentation with the frequency sampling method of Florack|2|. 

This paper presents accurate approximation algorithms for small scale Gaus- 
sian convolutions. To that end we presented a simple scheme that shows in what 
way any continuous convolution can be approximated with a discrete convolu- 
tion. The importance of the sampling operator and the interpolation operator 
for discretizing continuous image processing operators seems to be new for lin- 
ear operators (i.e. convolutions). In the morphological context it is not new (see 
Heijmans jSj), but in that context the operator that reconstructs a continuous 
image from its sampled representation is not called an interpolation. 
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Abstract. In the literature an image scale-space is usually defined as 
the solution of an initial value problem described by a PDE, such as a 
linear or nonlinear diffusion equation. Alternatively, scale-spaces can be 
defined in an axiomatic way starting from a hxed-scale image operator 
(e.g. a linear convolution or a morphological erosion) and a group of 
scalings. The goal of this paper is to explain the relation between these 
two, seemingly very different, approaches. 



1 Introduction 

In two previous papers we have presented an algebraic definition of scale- 

spaces. In our view a scale-space is the mathematical construct that describes 
the scale-dependent observation (probing) of images. We only look at the scale- 
space operators that are able of making observations at a finite scale without the 
necessity to make all observations at smaller scales as well. 

Our construction technique for scale-space operators consists of three con- 
secutive steps: (z) downscale the image by a factor t > 0 using a scaling operator 
S{t)~^ = S{l/t), (ii) apply an image operator z/) at unit scale, and (Hi) resize 
the image to its original scale using S(t). Thus we arrive at 

T(t) = S(t)i;S{t)-^ (1) 

as the scale-space operator. The scaling operators S(t) are assumed to form a 
group under composition. Refer to Section 0 for more details. In 0, where we 
have presented an exhaustive treatment of morphological scale-spaces of the form 
we have made a distinction between between additive scale-spaces satisfying 

T(t)T{s) = T(t -I- s) , f, s > 0 , 

and supremal scale-spaces satisfying 

T(t)T{s) = T(t Vs), t, s > 0 . 

In this paper we will focus on additive scale-spaces since only these possibly 
allow a description by a PDE (partial differential equation). 

At first sight, it appears that our approach is quite different from the PDE 
approach that has been advocated by various authors. In that approach one 

M. Kerckhove (Ed.): Scale-Space 2001, LNCS 2106, pp. 215-[22^ 2001. 

© Springer- Verlag and lEEE/CS 2001 
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takes the evolution of the zero scale image modeled by a partial differential 
equation as the starting point. In this paper, we will demonstrate that the two 
approaches are not so different as it may seem at first sight. In fact, it is rather 
straightforward to formulate a condition on the differential operator governing 
the PDE (see i:i2l below) that guarantees that the corresponding solution family 
(often a Co-semigroup) is of the form (P) . 

We conclude this section with an overview of this paper. In the next section 
we present a brief, and hence rather incomplete, discussion of some PDE’s often 
encountered in the image processing and computer vision literature. In Section 0 
we present our algebraic framework and apply it to two different linear scale- 
spaces. In Section 0 we consider the morphological scale-spaces governed by 
erosion and show that they can be associated with PDE’s of Hamilton- Jacobi 
type. SectionElis concerned with scale-invariance of PDE’s. It is shown that the 
solution operator of a PDE which is invariant under scalings corresponds with a 
(additive) scale-space in the algebraic sense. Our treatment of scale-invariance 
in Section 0 is self-contained. Alternatively, one might also choose to use Lie 
groups US! for the description of invariance under scalings and, possibly, other 
symmetries. We refer the reader to PI for some work in this direction. 

2 PDE’s in Image Processing 

In this section we list some of the PDE’s that are often encountered in the image 
processing literature. We emphasise, however, that our list is far from exhaustive. 
A more comprehensive overview of various scale-spaces and the corresponding 
PDE’s can be found in [^imi77| . 

There can be no doubt that the PDE most often encountered in image pro- 
cessing is the linear diffusion equation or heat equation. 



Ut = Au . (2) 

Perona and Malik [EHJP| were the first to consider a nonlinear version of the 
form 

Mt = div(c(||Vu||)VM) , (3) 

with the conductance function c(-) given by 

c(s) = exp(— s^/fc^) or c(s) = (l -I- (4) 

The underlying idea is that by reducing the diffusivity in regions with high gra- 
dient, one could possibly avoid too strong blurring in such regions. Perona and 
Malik referred to equation (0 as the anisotropie diffusion equation. Unfortu- 
nately, m in combination with a choice for c such as in o is not well-posed as 
shown by Catte et al. 0 ; see also Weickert m 

Alvarez, Guichard, Lions and Morel m derived a family of nonlinear models 
of the form 

Ut{t,x) = F{V'^u{t,x),Vu{t,x),u{t,x),x,t) (5) 
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under the assumption that the multiscale analysis under consideration is causal 
and regular; refer to P for the precise meaning of the word ‘causal’. Here F 
is a function mapping M x x R x R^ x R+ into R, with M being the 
space of symmetric matrices, which is nondecreasing in its first argument with 
respect to the partial ordering of M . Alvarez et al. P showed that under the 
given assumptions, © allows a so-called viscosity solution. The linear diffusion 
equation (P) corresponds with the case where 

F{w, V, u, X, t) = trace(w) . 

If, in addition, the multiscale analysis is invariant under isometries (Euclidean 
invariance) and under changes of contrast, then (0 can be simplified to 

Ut = ||Vu|| G(div(VM/|| Vu||), t) = ||Vm|| G(curv(rt),t) . (6) 

Here curv(w) = div(Vu/|| Vm||) is the curvature of the level line of u{t,x) pass- 
ing through X. Furthermore, G is nondecreasing in its first argument. If G is 
identically —1, then we arrive at 



^^t = -||Vu||, (7) 

which corresponds with the erosion with the Euclidean disk; see Example 0 in 
Section P This equation is a special case of the Hamilton-, Jacobi equation 

Ut = -H{Vu ) , (8) 

where the Hamiltonian H is a, convex function. In Section P we will explain 
that the solution of this equation can be obtained by means of a morphological 
grey-scale erosion. 

Dually, putting G = -1-1 in (0) yields 

Ut = llVull , (9) 

which corresponds with a dilation. Equation (0) is also called eikonal equa- 
tion H3|. Putting G(s,t) = s in 0 we arrive at the so-called mean curvature 
equation m- 

Ut = II VM|jcurv(rt) . (10) 

If the condition of Euclidean invariance is strengthened by assuming that the 
multiscale analysis is invariant under all affine transformations, then we arrive 
at the time-homogeneous equation 

Ut = Vm (curv(w)) , (11) 

called the affine morphological scale-space {AMSS) model by Alvarez et al. p. 
Sapiro and Tannenbaum [20j . who were following the curve evolution approach, 
arrived at the same equation independently. 

We denote the solution operator of a PDE such as O by U{t,s), where 
t > s. This means that the solution of the PDE endowed with initial condi- 
tion u{s,x) = f{x), is given by u{t) = U{t,s)f for t > s. The family U{t,s), 
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which is sometimes called multiscale analysis, satisfies the evolution property 
U{t,s)U{s,r) = U{t,r) for r < s < t. In this paper we shall be concerned ex- 
clusively with time-homogeneous PDE’s. For the PDF in this means that F 
does not depend on t, in which case it reduces to 



In this case the multiscale analysis U{t,s) can be written as T{t — s), where 
the family T{t), t > 0, satisfies the semigroup property T{t + s) = T{t)T{s). 
Below it will become clear that, under quite general assumptions, T(t) defines a 
scale-space. 

3 Algebraic Definition of Scale-Spaces 

In 01^ we have proposed an algebraic definition of a scale-space. The ingredi- 
ents of that definition are (i) a one-parameter family of scalings S{t), and (ii) an 
image operator ip which acts at images at a given unit scalefl With these in- 
gredients we can define a family of operators which governs the action of ip at 
various scales. 

Let us describe this construction in more detail. Let C be the collection 
of images under consideration. For the time being, we take £ to be the set 
of functions mapping into R = RU{— oo,-|-oo}. A one-parameter family 
S = {S{t) I t > 0} of operators on £ is called a scaling if 

S'(l) = id and S{t)S{s) = S{ts) , s,t>0. 

Thus S' is a commutative group with S{t)~^ = S{l/t) , t > 0. In PI different 
methods for the construction of scalings have been discussed. In this paper, the 
family S^’'^ with p, g G R given by 



plays an important role. Of particular interest are the following scalings: 

— spatial scaling: p = 1, g = 0: a function / is scaled in the spatial domain 
but not in the grey-level domain. 

— quadratic scaling: p = 1/2, g = 0. 

— umbral scaling: p = 1, g = 1: the corresponding scaling S^d scales the region 
in R^ X R beneath the graph of / (in the morphological literature, this region 
is called the umbra of /). 

Furthermore, observe that the case p = 0 and g = 1 corresponds with a grey-level 
multiplication: S^'^(t)f = tf. 

Now suppose we are given a scaling S and an image operator ip. Let T.^j,(t) 
be the operator which governs the action of ip at scale t; that is, to compute 

^ An inspiring discussion, from the physical viewpoint, about the role of scale in the 
description of the structure of images, can be found in the monograph of Florack 0. 



Ut{t, x) = F{y‘^u{t, x), \7u{t, x), u{t, x),x) . 



( 12 ) 



SP^^t)f = Ff{-/tP), t>0 



(13) 



Scale-Spaces, PDE’s, and Scale-Invariance 



219 



T^{t)f for an input image / we first ‘downscale’ the image by the factor t to 
obtain S(t)~^f, then we apply 'ip, and finally we resize the image to its original 
scale by applying S{t). Thus we get 

T^(t) = S{t)'iPS{t)-\ t>0. (14) 

If the resulting family of operators T^(t) satisfies the additive semigroup property 

T^{t)T^{s) = T,p{t + s), s,t>0, (15) 

then it is called a scale-space. Note that in |2j we have considered the more 
general case where the term at the right hand-side of mg equals Tjp{t + s), 
where - 1 - needs not be the addition on ( 0 ,oo) but may be some other semigroup 
operation such as the supremum. In this paper we will restrict ourselves to the 
additive case. 

Substituting t = 1 in (tTO and using that S'(l) = id, we get T^(l) = ip. 
Henceforth we omit the subscript ip from T.^(t) if no confusion is possible. We 
arrive at the following algebraic definition of a scale-space. 

Definition 1. Let S be a sealing on C. The family {T(t)}t>o of operators on L 
is ealled an {S, -|-)-scale-space if 

T{t)T{s) = T{t + s), s,t>0 (16) 

T{t)S{t) = S{t)T{l), t>Q. (17) 

If it is elear from the eontext whieh sealing is meant, then we shall eall {T(f)}t>o 
a seale-spaee. If {T(t)}t>o is a seale-spaee and T{1) = ip, then we eall ip the 
kernel operator assoeiated with {T(t)}oo. 

Assume that {T(f)}t>o is an (S', -|-)-scale-space, and let s,t > 0, then we get 

T{t)S{s) = T{t)S{t)S{s/t) = S{t)T{l)S{s/t) = S{s)S{t/s)T{l)S{s/t) 

= S{s)T{t/s)S{t/s)S{s/t) = S{s)T{t/s) 

The following result, proved in 0, expresses that the definition of a scale- 
space is independent of the choice of the unit scale. 

Proposition 1. If ip is the kernel of a scale-space, then the same holds for every 
operator T.,p{r), with r > 0. 

In fact, a straightforward computation shows that 

S(t)T^(r)S(f)-i = T^(rt) , 

hence that T^{r) is the kernel of the scale-space {T 0 (rf)}t>o. 

In jO] we have explored linear as well as morphological scale-spaces. We con- 
clude this section with a brief discussion of linear scale-spaces; see also dl. 
In the next section we consider scale-spaces corresponding with morphological 



erosions. 
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Example 1. {Gaussian Scale-Space) The family T{t) given by 

{T{t)f){x) = {2TTt)~^ J f{x - y) exp(-li|^)dy , (18) 

defines an (5'^’°, +)-scale-space with kernel operator 

V'(/)(a^) = (27t)”^ J fix - y) exp(-li^)dy . 

The family T{t) given by II I iSII is called a Gaussian scale-space. It is well-known 
that u{t,x) = {T{t)f){x) solves the linear diffusion equation in (EJ with initial 
condition u{0, x) = f{x). 



Example 2. {Gauchy Scale-Space) The linear operator 

f{x-y) 



{^{f){x)=Tr 2E{^) f J-^-^dy. 

2 Jr 2 1 + w 2 2 



/r^ [l+bP]® 

is the kernel operator of the -|-)-scale-space 

fix - y) 



iTi,it)f)ix) = Tr 



r 2 [t2+||y||2]f 



■dy . 



4 Erosion Scale-Spaces 

In this section we discuss a particular family of scale-spaces, namely those gov- 
erned by morphological erosions. Recall that an erosion is defined as an operator 
that distributes over arbitrary infima [7]- K is a well known fact (see e.g., 0) 
that every translation invariant erosion s in the set C of functions mapping 
into R is of the form 



Sbif){x)= f\ lfix-h)+bih)]. (19) 

In this expression, the function b is called the structuring function. The expres- 
sion for the erosion Sb is a well-known operation in convex analysis, where it is 
called infimal convolution and denoted by / B 6. Thus the erosion in fTTlll can 
be written as £&(/) = / B 6. Often, we will omit the subscript b and write s 
rather than £{,. If b is the indicator function of a set B C R^, i.e., b = Ib with 
Ib{x) = 0 a X £ B and -l-oo ii x ^ B, then 

ifBb)ix)= f\ fix-h), 

hGB 

which is also written as f Q B, where B = —B, the reflection of B. The erosion 
£(/) = f Q B is called a flat erosion 0. 
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Throughout the remainder of this section we assume that the function b is 
lower semi-continuous and convex, and satisfies b{x) > —oo for every x € A 
straightforward calculation shows that Tg(t) as defined by equals 

T,(t)f = fBS(t)b, (20) 

where S = In P] we have analysed in great detail the exact conditions 
under which the erosion e is the kernel of an -|-)-scale-space. Before we 

state some of the major results obtained there, we give a definition. A convex 
function b is called subpolynomial of degree k, where A: > 1, if 

b{tx) = t^b{x), X £ t > 0 . 



b is called subpolynomial of degree -l-oo if it is an indicator function. 

Proposition 2. The family {T£(f)}t>o given by 1121)11 defines an {3^’'^,+)- scale- 
space in each of the following two cases: 

(a) p = q= 1; 

{b)q<p<l or l<p<q and b is subpolynomial of degree k with k = 
(1 - q)/{l-p). 

At first sight, this result might suggest that more than one scale-space can be 
associated with a structuring function b that is subpolynomial of degree k. For 
example, if fc = 2 we can choose p = ^,q = 0 or p — |,g = 2 (case (5)) 
and also p — q = 1 (case (a)). However, if b is subpolynomial of degree k and 
k = (1 — g)/(l — p), then 

t%{x/tP) = f^-’^^P-^h{x/t) = tb{x/t) 

which means that SP'^{t)b = Therefore, in this case all scale-spaces 

given by Proposition |5| coincide. Putting 



u{t,-) = fBS{t)b, 



( 21 ) 



where S = it is known |H| that u is a solution of the Hamilton- Jacobi 

equation Q, that is, 

ut = -H{Vu) , (22) 

where the Hamiltonian H is the Young-Fenchel conjugate (also called Legendre 
transform) of b: 

H{x)=b*{x)= y [{x,y)-b{y)]. (23) 

yGR^ 

The structuring function b is called the Lagrangian. Since (6*)* = b, it follows 
that b can be recovered from H through b — H* . The solution formula in (EU 
is sometimes called the Lax- Oleinik formula. 

If H is convex and coercive, i.e., 



lim 

||a:||^-|-oo 



H{x) 

ll^ll 



= - 1-00 , 



then u{t,-) given by 112 111 is a so-called viscosity solution of 112 211 : refer to for 
more details. We consider two examples in more detail. 
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Example 3. Quadratic Structuring Functions 

Consider the quadratic scaling S corresponding with p = 1/2 and q — 0. Propo- 
sition 0 tells us that e{f) = / B 5 is the kernel of an {S, -l-)-scale-space Tg{t) if b 
is subpolynomial of degree 2. It is given by 

{Te{t)f){x)= f\ lf{x-y) + b{y/Vi)]. 

Note that in this expression, b{y/\/i) may be replaced by t~^ b{y) or tb{y/t). A 
typical example of a convex function which is subpolynomial of degree 2 is 

^Qi^) = ^{Qx,x) , 

where Q is a symmetric positive semi-definite matrix. It is known that 

6q B = bgxR , 

where Qx R = {Q~^ + . The erosion T^{t) is given by Ts(t)f = fHbt-ig 

and the semigroup property Tg{t)T^{s) = T'e(t -I- s) can also be derived from the 
fact that 

^t-iQ B bg-iQ = ^t-iQxs-iQ = ^(t-i-s)-iQ ■ 

This scale-space has been called the parabolic morphological scale-space; see 
as well as ITTinEI . The function u given by 

u{t,x) = (/B6t-lg)(x) 

is a solution of the Hamilton- Jacobi equation (E21 with H (x) = bgi^) = 1 
(Q~^x,x); see fill Part II, Sect. X. I]. If we choose Q = I, (E2J reduces to 

Ut = -^\\Vur. (24) 



Example Flat Structuring Functions 

Consider the spatial scaling given by p = 1 and q = 0. From Proposition 0 we 
derive that b has to be a subpolynomial of degree -l-oo, that is b = Is, where B 
is a closed convex set. The scale-space induced by e(/) = / B 6 is given by 

if)ix) = A 

y&tB 

In other words, Tg(t) is a flat erosion with structuring element tB. The function 
u{t, •) = Ts{t)f satisfies the PDF 



ut = -F[b{'^u) , 

where Hb{x) = Ib(x) = \/ support function of the convex set 
B; see e.g. m- If B is symmetric (i.e. B = B) and contains the origin in its 
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interior, then Hb is the norm associated with the unit ball B°, the polar set of 
B, defined by 

B° = {x € I {y, x) < I for all y G B} . 

For example, if B is the Euclidean unit disk in then B° = B and Hb{x) = 
||a;||, the Euclidean norm. The resulting PDE for u is 



ut = -||Vm|| , 



(25) 



which we have already encountered in ©• If B is the square ||x||oo < 1, where 
||x||oo = max(|a:i|, |a: 2 |), then B° is the diamond shape ||x||i < 1 where ||a;||i = 
|a^i| -I- \x 2 \, and in this case the resulting PDE is 



Ut = -||Vm||i 



-(I 




-k 




!)■ 



PDE’s of this type were first considered by Brockett and Maragos 0. 

Equations m and correspond with the cases m = 2 and m = 1 in the 
one-parameter family of PDE’s 

Ut = llVull"*, m>l. (26) 



If TO > 1, then the Hamiltonian Hm{x) = 
given by 

h ( \ ui-l 

bm{x) = 

TO 



;^||a;||'" has a conjugate bm = 




Furthermore, b\ is the indicator function of the Euclidean unit disk. Therefore, 
the solution of (r2till at time instant t is obtained through the erosion of the 
input function / with structuring function S^'^(t)bm - It is easy to verify that the 
erosion with structuring function bm is also the kernel of an (S'™ -|-)-scale-space 
if TO > 1. 



5 Scale-Invariance of PDE’s 

In Section 0 we have expressed our interest in initial value problems of the form 

/ Ut =A{u) , . 

\u{0,x) =f{x), 

where the initial condition / corresponds with the input image. Here A is the 
differential operator 



A{u){x) = F{V^u{x), Vu{x), u{x), x) , (28) 

where F has the properties listed in Section 0 

An important problem, and often very difficult, concerns the existence and 
uniqueness of solutions of the initial value problem (EZ». The difficulty of such 
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problems depends not only on the specific PDE involved but also on the prop- 
erties of the underlying function space. In this paper we will not deal with the 
existence and uniqueness problem, but assume that there is a space C such 
that for every input image f € C the initial value problem m has a solution 
u{t) = T{t)f in C. Note that the exact meaning of the concept “solution” needs 
to be specified. In particular, if A is the infinitesimal generator of a Co-semigroup 
T{t) on a Banach space £, then the solution concept has a specific meaning for 
initial conditions / G D(A), the domain of A, which is known to lie dense in the 
underlying space £ m- 

Throughout the remainder of this section we assume that we can associate a 
solution u(t) = T{t)f with the initial value problem II27I) . Since the underlying 
PDE is autonomous, the one-parameter family {T{t) | t > 0} of operators on £ 
forms a semigroup, i.e., T{t+s) = T{t)T{s), for t,s > 0. Recalling our definition 
of a scale-space from Definition Q we see that T{t) defines an (S', -|-)-scale-space 
if dI3 holds, or equivalently, if 

T{t)S{s) = S{s)T{t/s), t,s>0. (29) 

Here S(-) is a scaling and, as before, we shall restrict ourselves to scalings 
Relation (1291) can be formulated in terms of the initial value problem (12711 . As- 
sume that u{t, x) is a solution of (E71 . fix s > 0, and define 

u(t,x) = s'^u(t/s,x/s^) . (30) 

In order for (f2t)|l to hold, it is necessary and sufficient that ft is a solution of the 
initial value problem (EJ with initial function / replaced by S{s)f. Note indeed 
that 

fi(0,a;) = s‘^u{0,x/s^) = s^/(a;/s^) . 

Thus it remains to be verified that u satisfies Ut = A{u) . This yields the following 
relation for A\ 

5'(s)A5(s)-i = sA, fors>0. (31) 

Note that this equation can also be obtained by differentiating (12!-) II with respect 
to t and substituting t = 0. 

If we assume in addition that A is of the form m, then we arrive at the 
equation 

si~^F{w, V, u, x) = s'^u, s^x) . (32) 

We consider some examples. 

Example 5. (a) For the linear diffusion equation we have F(w, v, u, x) = trace(w), 
and (1.4211 yields q — 1 = q — 2p, that is p = Notice that in this case, like in all 
linear cases, the value of q is irrelevant. 

(&) The Hamilton- Jacobi equation (0 corresponds with F{w,v,u,x) = —H{v) 
and (TT^ yields 
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If H is subpolynomial of degree i, then the right hand-side equals p^H{v) 
and we find 

q-l = i(q-p). 

These values of p, q, I correspond with those that follow from Proposition]^ with 
h = H* . Note that H is subpolynomial of degree I if and only if 6 = H* is 
subpolynomial of degree k where 1/i + 1/k = 1. 

(c) The anisotropic diffusion equation O) gives rise to a scale-space if c is of the 
form 

civ) = \\v\\-^, 

for some m > 0. In that case (ld‘2jl gives rise to the relation 

2p + m{q — p) = 1 . 

PDE’s of this type have been investigated by Tsurkov m for the one-dimensional 
case. 
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Abstract. We propose a new complete method to extract significant de- 
scription(s) of planar curves according to constant curvature segments. 
This method is based (i) on a multi-scale segmentation and curve ap- 
proximation algorithm, defined by two grouping processes (polygonal and 
constant curvature approximations), leading to a multi-scale covering of 
the curve, and (ii) on an intra- and inter-scale classification of this multi- 
scale covering guided by heuristically-defined qualitative labels leading 
to pairs (scale, list of constant curvature segments) that best describe 
the shape of the curve. Experiments show that the proposed method is 
able to provide salient segmentation and approximation results which 
respect shape description and recognition criteria. 



1 Introduction 

In order to easily manipulate a planar curve or databases composed of pla- 
nar curves, it would be interesting to represent data according to primitives 
which describe them in a way that respects their actual shape for recognition 
and compression purposes. In this paper, we present an improved version of the 
multi-scale segmentation and curve approximation method introduced in DP 
It is related to the category of the methods favoring shape recovery PPi 
However, this new method tries to go behind limitations generated by the other 
methods by identifying in a formal way the requirements related to the problem 
of segmentation and approximation of a planar curve The associated algo- 

rithm represents a generalization of the paradigm recover-and-select established 
by Leonardis and Bajcsy 

The original method that we propose in order to extract significant descrip- 
tion(s) of planar curves into lists of constant curvature segments (such as straight 
line segments and/or circular arcs) is based on MuscaGrip (which stands for 
MUlti-scale Segmentation and Curve Approximation based on the Geometry of 
Regular Inscribed Polygons), a multi-scale segmentation and curve approxima- 
tion algorithm, leading to a multi-scale covering of the curve. The MuscaGrip 
algorithm is defined by two grouping processes: (i) & polygonal approximation 
(from points to straight line segments), and (ii) a, constant curvature approxima- 
tion (from straight line segments to straight line segments and/or circular arcs). 

M. Kerckhove (Ed.): Scale-Space 2001, LNCS 2106, pp. 227-[2^^ 2001. 

© Springer- Verlag and lEEE/CS 2001 
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MuscaGrip repeats the first grouping process using each point on the curve as its 
starting point and the second grouping process using each straight line segment 
provided by the first process as its starting segment. These repetitions lead to 
a complete description of the curve composed of lists of constant curvature seg- 
ments at different scales. Although they increase the computational load of the 
algorithm, these repetitions are necessary in order to respect invariance criteria. 
In order to find a set of pairs composed of one scale and one list of constant cur- 
vature segments that best describe the shape of the curve, a global combinatorial 
method of the multi-scale covering is introduced, guided by heuristically-defined 
qualitative labels leading to a single non-redundant subset. 

In the following, the description of the complete method is presented. The 
MuscaGrip grouping processes are first described. The method to extract the 
minimal set of adequate pairs (scale, list of constant curvature segments) is then 
introduced in details with its inter- and intra-scale classification steps. Finally, 
experimental results are presented for open and closed planar curves. 



2 MuscaGrip: Point Grouping Process 

The generic process first splits a planar curve into several sub-curves, each of 
which is approximated by a straight line segment. The associated point grouping 
criterion is equivalent to a co- circularity criterion among the connected points 
of the sub-curve. A scale parameter, acting as a maximum deviation criterion, is 
associated with a scale measure. 

It is assumed that a point chain of two points forms an uniform sub-curve. 
A chain of three or more points forms an uniform sub-curve if and only if the 
perpendicular distance of each point of the chain relative to the straight line 
joining the two endpoints of the chain is less than or equal to the scale parameter. 
The computation of this step is repeated at a number of scales to provide a 
multi-scale set of polygonal approximations, and using all points on the curve 
as a starting point. 

More formally, let C be an open or closed planar curve, an ordered list of 
n points pi{xi,yi), for i G [l,n], where pi,pi+\ are consecutive points along 
sampled curve, then C = {pi{xi,yi) \ i G [l,n] A Xi G TZ A yi G TZ}. When C is 
open, Pi is obtained from the first point of the curve and from the last point. 
Otherwise, pi (consequently p„) is obtained from an arbitrarily selected point 
of C. Let S be an ordered list of m scales Sj, for j G where Sj,Sj+i are 

consecutive scales, then 5 = { Sj | j G [1, m] A sj G TZ} with si the finest scale, 
and Sm the coarsest scale. 

At a given scale Sj G S, PAc{sj,pi), a polygonal approximation associated 
to a planar curve C and generated from point pi G C \s defined by an ordered 
list of p straight line segments slsk{pq,Pr), for k G [l,_p], whose Pq and pr are 
the first and last points of an uniform sub-curve of C. We thus have: 



PAc{Sj,Pi) = { slSk{Pq,Pr) I k G [l,p\ Apg,Pr G C }. 



(1) 
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For a closed planar curve, PAc{sj,pi) is defined by in the clock- 

wise direction and PA^^^'^{sj,pi) in the counter-clockwise direction (overshoot 
can occur). 

3 MuscaGrip: Straight Line Segment Grouping Process 

For a polygonal approximation, the constant curvature approximation process 
aims at grouping n > 2 adjacent straight line segments into circular arcs when- 
ever feasible. The associated uniformity criterion is based on the model of a regu- 
lar polygon, formed of n > 2 segments, approximating the circular arc (noted ca 
in most figures) into which it is inscribed. Let a be the radius of the inscribed cir- 
cular arc, and let R be the radius of the circumscribed circular arc, the difference 
between R and a is related to Sj, the scale parameter. The constant curvature 
approximation is then obtained using a merging process of consecutive straight 
line segments of the polygonal approximation. 

Given a polygonal approximation and a regular polygon whose features are 
induced by a sublist of this polygonal approximation, is it possible that, by 
adding to this sublist a straight line segment being adjacent to it, the new sublist 
still be at the basis of a regular polygon whose features are similar to those of 
the old instance? If such is the case after consideration of a set of uniformity 
criteria, a new straight line segment adjacent to the sublist is targeted whenever 
possible. 

More formally, let V be a sublist of a polygonal approximation composed of 
p straight line segments. V is defined by an ordered list of p' segments, with 
p' < P- If p' is equal to 1, V is only composed of one straight line segment, 
then V is uniform. Before continuing, let us note that a regular polygon TZV' 
originating from an uniform sublist V is entirely described by (i) the angle 9' 
between two consecutive sides, (ii) the length I' of each side and (Hi) n' , its 
number of sides. Derived features are deduced, (i) R' , the value of the radius of 
the circumscribed circular arc, and (ii) a! , the value of the radius of the inscribed 
circular arc, commonly called apothem. If p' is equal to 2, V composed by two 
straight line segments, slsi and sls 2 , will be considered uniform if and only if 
the regular polygon TZV' originating from this one is validated by the following 
uniformity criteria: 

!• Sj — Ss < R' — a' and R' — a' < sj -|- 5s. 5s corresponds to the step between two 
consecutive scales of S, 

2» the features (Lisi and lais 2 , the lengths of the segments, and the an- 

gle between the two segments) describing V must be validated by the features 
describing two regular control polygons, TZV'a.-Ss and TZP'g^^sa- These latter are 
induced by R' and the scales sj — 5s and sj -I- 5s. 

When V' is uniform, a first instance of a regular polygon TZV' inscribed into 
the approximating circular arc is created. 

If V is uniform then the sublist V composed of p {p = p' 1) straight line 
segments, and defined by "P = { "P' U {sis} \ sis € PA} is also uniform if and 
only if a regular polygon TZV, defined by 9, I and n, can be deduced from V and 
TZV' according to various uniformity criteria: 
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!• p is equal to or higher than 2, 

2» p is lower than or equal to n' , the number of sides of TZV' , the regular polygon 
induced by "P', 

3» Sj — 5s < R — a and R — a < sj + Ss, 

4» the features (length of the segments, angle between two consecutive segments) 
describing V must be validated by the features describing two regular control 
polygons, TZVaj-Sa and TZVaj+Sa- These latter are induced by R and the scales 
Sj — 5s and Sj + 5s. 

At a given scale Sj € S, CCAc{sj,pi,slsk), a constant curvature approxi- 
mation related to a polygonal approximation PAc{sj,pi) and initiated by the 
straight line segment slsk is defined by an ordered list oi q + r constant curvature 
segments ccSs for s G [1, gr -I- r]: q straight line segments slsu for u G [1, q] and r 
circular arcs ca„ for v G [1, r]. ca„ is provided by the grouping of an ordered list 
of straight line segments according to the uniformity criteria listed above. We 
thus have: 

CCAc{sj,pi, slsk) = { ccsa I s e [1, gt -I- r] A {ccsa G PAc{sj,pi) V ccsa = Usls) }. (2) 

Overlap can occur inside CCAc{sj,pi, slsk). Once again, this step is repeated 
using all straight line segments provided by the polygonal approximation as a 
starting segment. 



4 Extraction of the Best Descriptions 

A significant computational load results from the proposed multi-scale segmen- 
tation and approximation of a planar curve C. This multi-scale method leads to 
many representations. Among them, only the more salient ones should be consid- 
ered. For that purpose, we define an intra- and inter-scales classification of this 
multi-scale description, guided by heuristically-defined qualitative labels leading 
to a set of representation(s) which respect shape description and recognition 
criteria. 



4.1 Labeling of Polygonal Approximations 

A classification of the results obtained from the first grouping process is a good 
starting point for extracting salient approximations. We associate a qualitative 
label to each polygonal approximation associated with both open and closed 
curves. Three labels are defined, label VGpA for Very Good Polygonal Approx- 
imation, label GpA for Good Polygonal Approximation, and label Apa for Ac- 
ceptable Polygonal Approximation. 

In the case of an open curve C, at scale Sj, (i) label VGpa means that end- 
points of C, Pi and are real endpoints of PAc{sj,pi), (ii) label GpA means 
that Pi or pn is a virtual endpoint of P Ac{sj,pi), the other being real, and 
(Hi) label Apa means that pi and are both virtual endpoints of P Ac{sj,pi). 
In the same way, for a closed curve C, at scale Sj, (i) label VGpa means that 
Pi is the starting and ending point of PAf^'“{sj,pi) and (sj,pi), (ii) 

label GpA means that pi is the starting and ending point of PAf^'^{sj,pi) or 
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(sj,pi), and (Hi) label Apa means that pi is the starting and ending 
point of neither PAf^'^{sj,pi) and PA’^^^'^{sj,pi). In the latter two cases, over- 
shoot occurs. If 

S{VGpac) = {PAc{sj,pi) I label = VGpa}, 

S{Gpac) = {PAcispPi) I label = Gpa}, (3) 

S{Apac) = {PAc{sj,pi) I label = Apa}, 

where S{Xpac) is a set composed of polygonal approximations of C labeled X, 
then, 

S{VGpAc)r)S{GpAc) = 

S{GpAc)r)S{ApAc) = 9 (4) 

S{VGpAc)nS{ApAc) = 

Therefore, 

S{VGpAc)uS{GpAc)uS(ApAc)=PAcisj). (5) 



4.2 Labeling of Constant Curvature Approximations 

Following the first grouping process, a labeled polygonal approximation 
PAc{sj,pi) leads to CCAc{sj,pi) composed of p CCAc{sj,pi,slsk)- Each one 
can in turn be qualitatively labeled: label VGcca for Very Good Constant Cur- 
vature Approximation, label Gcca for Cood Constant Curvature Approximation, 
and label Acca for Acceptable Constant Curvature Approximation. 

For both open and closed curves, (i) label Acca means that overlap (and 
overshoot if C is closed) occurs into CCAc [sj,pi, slsk), (H) label Gcca means 
that no overlap (and no overshoot if C is closed) occurs into GCAc{sj,pi, slsk) 
but the ratio between the number of constant curvature segments and the num- 
ber of straight line segments from PAc (sj,Pi) is close to 1.0, consequently 
CGAc{sj,pi, slsk) is composed principally of straight line segments and then 

CCAc{sj,pi,slsk) = PAc(sj,pi), (6) 

and (Hi) label V Gcca means that no overlap (and no overshoot if C is closed) oc- 
curs into CGAc{sj,pi, slsk) and the ratio (also called compression rate) between 
the number of circular arcs and the number of constant curvature segments is 
high. When sj is coarse, Gcca is used more often than VGcca because search- 
ing to group adjacent straight line segments into circular arcs is less feasible, 
the number of straight line segments into PAc{sj,pi) decreasing with increasing 
scale Sj. If 

S{VGccac) = {CCAe{sj,Pi,slsk) \ label = VGcca}, 

S(Gccac) = {GGAc{sj,pi, slsk) \ label — Gcca}, (7) 

S(Accac) = {GGAc{sj,pi, slsk) \ label = Acca}, 

where S(XccAc) is ^ set composed of constant curvature approximations of C 
labeled X, then 



SiVGccAc)nS{GccAc) = ^ 
S (Gccac ) n S (Ace Ac ) = 0 
S{VGccac) t^S{AccAc) = 0 - 



(8) 
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Therefore, 



S{VGccac)'JS{Gccac)'J S{Accac) = CCAc{sj,pi). (9) 

The most interesting description for a curve is a set of one or several VGcca 
provided by a VGpA- If no VGpA exists, then the most interesting description 
for a curve is a set of one or several VGcca provided hy a, G pa- The compression 
rate allows to partition S{GCAc). If several GCAc{sj,Pi,slsk) have the same 
compression rate then they form a partition of S{CGAc), and they can in turn 
be classified according to the accumulation of the errors (also called error rate) 
generated between each pair of adjacent constant curvature segments. A good 
compression rate and a weak error rate are thus significant factors. 



5 Results 

This section presents results for various open and closed curves. To generate 
results, the algorithm proceeds as follows: for each curve C, at each scale sj G 
S, search for PAc{sj,pi) labeled VGpA, then search by intra-and inter-scales 
classification for the most significant CGAc{sj,pi, slsk) labeled VGcca- 

FiglHa) provides for a spiral of Archimede-shaped open curve C the best 
constant curvature approximation hypothesis for one scale. In order to highlight 
the span of the circular arcs, grey lines are drawn. Best description hypothesis 
for a semi-limacon of Pascal-shaped open curve is shown in FigH^b) for working 
scales Sj G [1.0, 3.0] with Ss = 1.0. For these two results, we can appreciate the 
excellent compression rate of data. 

Invariance to similarity transformations such as translation, rotation and 
scaling is an important criterion to which a good algorithm of segmentation 
and approximation of planar curves must conform to in order to provide similar 
descriptions under various conditions. In order to show invariance, four different 
orientations are used on an ellipse-shaped closed curve and results are shown 
on FigEl For this curve and under any condition, each obtained description, 
formed by four circular arcs, is representative of the geometrical shape. Let us 
note that the origin of each circular arc is located on the axes of symmetry of 
the curves. In order to visualize the behavior of the algorithm more adequately 
in the presence of the same curve at several scales, we chose to show results on 
one set made up of astroids. Whatever the scale to which the curve appears, 
its general description must remain the same. The results shown on FigOJa) 
illustrate the very good behavior of the algorithm relative to scaling. 

An interesting aspect of the MuscaGrip algorithm is the conservation of exist- 
ing symmetries. Fig0(b) illustrates this fact by experimenting on a rose-shaped 
closed curve whose complexity is high. The rose is formed by ten petals and the 
constant curvature approximation hypothesis is particularly convincing. Each 
petal is described in the same way. Only circular arcs are included in the de- 
scription. The conservation of symmetries is also visible on Fig 0 and Fig0[a). 
A polygonal approximation of recursive subdivision type, 0, can reduce the 
overall process only when a planar curve is composed of symmetries. On the 
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other hand, in the case of an unspecified curve, a polygonal approximation as 
recommended in MuscaGrip is necessary and impossible to circumvent. 

When noise is added along the curve, the method has to find a final descrip- 
tion including the same number of primitives and the same type of primitives 
as description that one would have obtained without noise. In the presence of 
noise, it is obvious that a multi-scale method is more suitable and more robust 
because no matter what occurs, there always exists one scale likely to attenuate 
it. Results shown on FiggI for an ellipse-shaped closed curve are very satisfac- 
tory because, in spite of the more or less significant irregularity in the signal, 
the algorithm is able to provide an acceptable description. 




Fig. 1. Best constant curvature approximation hypothesis, (a) For a spiral of 
Archimede-shaped open curve C composed of 1440 points, at scale Sj = 1.0 with 
&s = 0.15, for PAc{sj,pi) (= 45s/s) labeled VGpa, CCAc{sj,pi,slsk) = {3sls,8ca). 
(b) For a semi-limacon of Pascal-shaped open curve C composed of 360 points, at 
scales Sj € [1.0, 3.0] with Ss = 1.0, for PAc{sj,pi) labeled VGpa, CCAc{sj,pi,slsk) = 
[Qsls, 4m)). 



6 Conclusion 

A complete method to extract significant descriptions of planar curves as or- 
dered lists of constant curvature segments was presented. This method is based 
(i) on MuscaGrip, a multi-scale segmentation and curve approximation algo- 
rithm, defined by two grouping processes leading to a multi-scale covering of 
the curve, and (ii) on an intra- and inter-scale classification of this multi-scale 
covering, guided by qualitative labels, leading to a single non-redundant subset. 
The goal is to find a minimal set of pairs composed of (scale, ordered list of con- 
stant curvature segments) to best describe the shape of the curve. Experiments 
on synthetic curves have shown that the proposed method is able to provide 
salient segmentation and approximation results which respect shape description 
and recognition criteria, and which have a good data compression rate. A more 
exhaustive experimental evaluation of algorithms on curves of various types, 
ideal and noisy, and contours from real 2D illuminance images is presented in 
and confirm the good behavior of the method. Furthermore, this research work 
is part of a more generic project for detecting and describing 3D objects in a 
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Fig. 2. Invariance to translation and rotation for an ellipse-shaped closed curve, 
composed of 720 points, at scale Sj = 3.0. 



single 2D image based on high-level structures obtained by perceptual grouping 
of constant curvature segments 0 . 
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Abstract. An algorithm for segmenting 2D shapes into parts is de- 
scribed. The segmentation is constructed from the local symmetry axes 
of the shape. The local symmetry axes are determined by analyzing the 
local symmetries of the level curves of a function which is the solution of 
an elliptic PDE. The segmentation has the structure of a directed graph. 
The shapes need not be presmoothed and the algorithm may be applied 
to a complex scene consisting of many objects. 



1 Introduction 

What is meant by shape segmentation in this paper is decomposition of a 2D 
shape into ribbons. The basic idea is to delineate the main body of the shape by 
segmenting out protrusions. The inclusion relations among protrusions induce 
the structure of a directed graph on the segmentation, analogous to the graph 
structure associated with shape skeletons. A finer segmentation may be obtained 
by segmenting the shape across narrow necks. The approach is based on the 
concept of local symmetry axes which was developed by Tari, Shah and Pien 
in [3] and further developed in [4]. Axes of local symmetry are analogous to 
the more commonly used medial axes. If a 2D shape is viewed as a collection of 
ribbons glued together, then the medial axis of each ribbon may be thought of as 
a local symmetry axis. In contrast to the usual use of medial axes to obtain shape 
skeletons, here they are used to segment the shape. The set of local symmetry 
axes is found by analyzing the level curves of a function, v, which is the solution 
of an elliptic PDE. The function v smooths the characteristic function of the 
shape boundary. A point on a level curve of u is a point of local symmetry if 
the level curve is locally symmetric about the gradient vector of v at that point 
upto second order. The local symmetry axes may also be described as the ridges 
and the valleys of the graph of v. The rationale underlying this approach is that 
if a shape has certain symmetries, then the solution of the PDE ought to reflect 
these symmetries. 

One of the motivations behind this approach was to carry out noise suppres- 
sion and extraction of shape properties simultaneously. In this spirit, the level 
curves of v may be thought of as successive smoothings of the shape boundary 
and thus the approach described above has close similarities with that based on 
curve evolution. However, the advantage here is that the necessary properties of 
the level curves are calculated from the differential properties of the function v 
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itself, without having to locate the level curves of v. This is not true in the case of 
curve evolution because the time of arrival of the evolving curve at various points 
in the domain almost never defines a function over the domain. (The evolving 
curve may cross a given point several times.) Moreover, differential derivatives 
needed in the case of curve evolution are one order higher than in the approach 
proposed here, making the numerical calculations more sensitive to noise. Like 
curve evolution, there is a smoothing parameter in the elliptic PDE and as it 
tends to zero, function v tends to the (rescaled) distance function. Of course, 
shape analysis based on distance function has a long history. (For some recent 
developments, see the [2].) Local symmetry axes may be used to extract shape 
skeletons, but as in the case of curve evolution, smoothing built into the process 
disconnects the skeleton. It is more useful to employ local symmetry axes to 
segment the shape instead. 

The work described here is also related to that of Zhu [5] who has formulated 
a segmentation functional to draw optimal chords. The optimal chords determine 
ribbon like portions of the shape whose medial axes determine a partial shape 
skeleton. The shape skeleton is completed by joining the medial axes in an op- 
timal way. The advantage of this approach is that the segmentation functional 
provides a goodness criterion for evaluating optimality of shape skeletons and 
also permits a statistical framework. However, the technique requires calculation 
of the shape normals and thus shape has to be presmoothed. 

The version of the algorithm described here extracts the longest possible 
ribbon from the shape by segmenting out protrusions. There are several obvious 
refinements that can be made. It might be necessary to further segment the 
ribbon segments found by the algorithm. The algorithm does provide the option 
of further segmenting each of the ribbon segments by creating cuts originating 
at the saddlepoints of v. However, this may or may not be desirable depending 
on the application. For example, in shapes with a long neck, a saddlepoint will 
be formed at its narrowest point and the neck will be segmented there. However, 
it might be preferrable to isolate the whole neck as one object. The algorithm 
is not sensitive to special symmetries such as those exhibited by a square or a 
rectangle. It has to be modified if these special coincidences have to be taken 
into account. For instance, depending on the numerical choices made, it will 
segment out two of the corners of a rectangle as protrusions, identifying the rest 
as the longest ribbon that can be extracted. Such a segmentation would appear 
reasonable in the case of a parallelogram in which case the two obtuse angles will 
be segmented out as protrusions and a long ribbon around the long diagonal will 
be extracted. Since the algorithm does not recognize special cases, it treats the 
rectangle as a generic parallelogram. Another example where the algorithm does 
not recognize symmetry is provided by the star-shape shown in Fig. 1. Instead 
of treating all the arms on the same footing, the algorithm picks out two of them 
to make up the longest possible ribbon and segments out the others. 
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2 Smoothing of the Shape Boundary 



A shape is described by specifying its boundary in the form of a collection of 
curves, F, inside a bounded domain D of the plane. All that is necessary is that 
r be sufficiently regular for the solution of the differential equation given below 
to exist. We consider the usual functional for smoothing the characteristic 
function of F : 



Ep{v) = J 
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with the boundary condition u = 1 along F where 



the characteristic function xr 



1 along F 
0 elsewhere 



(3) 



Alternative smoothing strategies are possible. The advantage of this particular 
functional is that it behaves correctly in the limit as p ^ 0: 



lim inf E„{v) = length{F) (4) 

p^O 

The minimizer of Ep satisfies the elliptic differential equation 

= 4 (5) 

P 

with boundary conditions u = 1 along F and ^ = 0 along the boundary of D. 
The parameter p plays the role of the smoothing radius. 

Although what is relevant here is the global behavior of v which determines 
the axes of local symmetries, it is interesting to note that when p is small com- 
pared to the local width of the shape and the local radius of curvature of T, 
the level curves of v locally capture the smoothing of F by curve evolution. As 
shown in Appendix (3) of [1], when p is small, 

where k{x, y) is the curvature of the level curve passing through the point (x, y) 
and n is the direction of the gradient. If we imagine moving from a level curve to 
a level curve along the normals, then a small change of 5v in the level requires 
movement 

6r^-^{l + ^)5v (7) 

V 2 

where r denotes the arc length along the gradient lines of v. Define time t such 
, dt dv 
that Then 

p^ 2v 

dr 2 
— « - -I- K 



(8) 
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Fig. 1. Left: Level Curves of v. 



Right: Si . 



As pointed out above, the global behavior of the level curves is radically different 
from that of curve evolution since the value of u at a point cannot be determined 
by its values in a local neighborhood. For example, consider a closed level curve 
with large width and small curvature everywhere so that its evolution mimics 
the curve evolution. As the level curve shrinks and narrows, interaction between 
its opposite sides becomes significant and the gradient of v will be less than what 
it would be without the interaction. For instance, the level curve speeds up as it 
nears a saddlepoint. 



3 Local Symmetries, Medial Axes, and Skeletons 

Loci of local symmetries are now defined by analyzing the local symmetries of 
the level curves of v. These loci consist of one-dimensional branches and their 
terminal points. The level curves of v inside a starlike shape are shown on the 
left in Fig. 1. (In all the figures, most of the region in the frame D outside the 
shape is not shown.) Notice that along the apparent medial axes of protrusions, 
the level curves are further separated than they are in the neighborhood of 
indentations. The tips of protrusions are in some sense furthest away from the 
apparent center of the shape. Now the distance between two adjacent level curves 
is given by If we define the semimetric 

dv/dl 



( 9 ) 
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where dl is the infinitesimal Euclidean distance, then the geodesics satisfy the 
equation 



ds 



= 0 



(10) 



where s is the arc-length along the level curves of v. The symmetry of the level 



curve at a point P where 



d||Vi; 

ds 



= 0 is revealed by the missing ?7^-term in the 



Taylor expansion of v in terms of the local coordinates rj and ^ where 77 is in the 
direction of Vn and ^ is tangent to the level curve: 



V = aoo + aioT] + aoiC + a2oV^ + ^ 02 ?^ H (11) 

Thus locally at P, the level curve v = aoo is approximately a conic section 
whose one of the principal axes coincides with the gradient vector. An equivalent 
description of the symmetry at P is that the Hessian of u at P is diagonalized 
when expressed in terms of the local coordinates 77 and This means that the 
gradient vector Vv is an eigenvector of the Hessian at P. The last description 
may be generalized to define partial symmetries of shapes in dimensions >2 [ 4 ] . 

As explained above, along the middle of protrusions, the distance between 
adjacent level curves is the greatest, that is, ||Vu|| is minimum along the level 
curve. So let denote the closure of the set of zero-crossings of where 

is positive and let denote the closure of the set of zero-crossings of 

where is negative. The connected components of S'i\(S'j*' n S^) 

are called the branches of Si. The direction of each branch is in the direction 
of increasing v. The locus in the case of the star-like shape is shown on the 
right in Fig. 1 . (Note the characteristic behavior as a branch of approaches 
the boundary of D.) 

The set n consists of the terminal points of the branches of Si and it is 
the union of two sets, Sq and J. The set So is defined by the equation |j Vu|| = 0 
and the set J is defined by the equations = 0 . The set So may be 

further subdivided into the set Sq of elliptic points where the determinant of the 
Hessian of v is positive, the set Sq of hyperbolic points where it is negative and 
the set Sq of parabolic points where it is zero. At an elliptic point, v has a local 
minimum and has the Taylor expansion of the form aoo + 0-20^^ + ao2j/^+ higher 
order terms. By applying the definition of S^ and S^ to this local expression, 
it is easy to see that at an elliptic point, there are two branches of S^ directed 
away from the point in the direction of the maximum second derivative and two 
branches of S^ directed away in the direction of the minimum second derivative. 
At a hyperbolic point, v has the Taylor expansion of the form aoo + o,uxy+ 
higher order terms and calculations show that at a hyperbolic point, there are 
four branches of Si all of which belong to S^ . Hyperbolic points are of course 
saddlepoints in that two of these branches are directed away from the saddlepoint 
and two are directed towards it. In theory, the set Sq of parabolic points may be 
one-dimensional, but numerically it is impossible to identify such points without 
setting some kind of a numerical threshold. What we find is that a parabolic line 
is seen numerically as a series of elliptic and hyperbolic points, making analysis 
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Fig. 2. Left: with p = 128. Right: with p = 8. 



of parabolic points difficult. However, from the point of view of segmentation, 
all that is needed is determination of the kind of branches of that are present 
in a tubular neighborhood. 

Generically, a point in J is a junction of a branch from and a branch from 
Si . As in the case of parabolic lines, in the absence of a numerical threshold, all 
points in J are numerically regarded as belonging to this category. A point in 
J belongs to the subset if the two branches of S\ are directed away from it 
and it belongs to the subset J~ if they are directed towards it. At points of J, v 
has a local maximum or a local minimum when restricted to S\ . It is minimum 
if the point belongs to J'*" and maximum if it belongs to J~ . Junctions of type 
J~ arise when a parabolic line breaks up into a series of elliptic and hyperbolic 
points or when there is protrusion present near a neck. The latter case can 
be seen in Fig. 1. Shape protrusions narrow the space between the shape and 
boundary of D, creating saddlepoints along the boundary of Zl. A branch of 
from such a saddlepoint is linked to a branch of emanating from a nearby 
shape indentation by a segment of . (Note that an indentation in the shape 
behaves like a protrusion when seen from outside the shape.) In principle, there 
should be exactly one point of J'*" and one point of J~ in this linkage. However, 
there is numerical degeneracy due to the fact that the level curves = 0 

and = 0 Eire nearly coincident over some distance creating a series of 

points numerically identified as points of J, (see the saddlepoint on the right). 

Since there are only four branches at an elliptic point, most branches of Si 
end up at points in J. The smaller the protrusion, the shorter the branch. Notice 
the extremely short branches near the shape boundary, created by the noise in 
the boundary. The two branches of S^ meeting at the elliptic point inside the 
star define the medial axis of the longest ribbon that can be extracted from the 
shape. 

The construction described above depends on the choice of the smoothing 
parameter p. The locus S^ shown in Fig. 1 was obtained with p — 82 pixels. 
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(The size of the frame D is 400 x 400 pixels.) Figure 2 depicts the same locus 
determined using p = 8 and p = 128. The features that are sensitive to the choice 
of p are the location of the points in J and the number of saddlepoints. The larger 
the value of p, the shorter the protrusion axes. What recedes are the portions of 
these axes projecting outside the protrusions, and the points in J move closer to 
the corresponding shape projections. Since the function v emulates the distance 
function more and more closely as p tends to zero, it begins to detect even very 
wide necks by creating more saddlepoints. 

We conclude this section with the definition of the shape skeleton: 
Definitions: A medial axis is a branch of which starts at an elliptic point or 
at a point in and ends either at the shape boundary or at a saddlepoint. A 
medial axis starting at an elliptic point is called a main axis while a medial axis 
starting at a point in J+ is called a protrusion axis. The skeleton of the shape 
is the union of its medial axes. 

As noted before, a shape skeleton need not be connected. 

4 Segmentation 

The medial axes detect the ’’corners” of the shape. The saliency or the extent of 
each corner may be guaged by the length of the associated medial axis. However, 
the main objective of this paper is to segment protrusions and indentations by 
means of their medial axes. The basic idea is to find the two nearest points 
on the shape boundary from the terminal point of the protrusion axis, one on 
each side of the axis and connect these two points to segment the protrusion. 
The important point is to restrict the search to a suitable neighborhood of the 
protrusion axis. To solve this problem, we use the fact that segments the 
frame D and inside each connected component of D\S\, is either positive 

or negative. Segmentation of D by the zero-crossings of in the case of the 

star-figure is shown on the left in Fig. 3. Each protrusion axis neighbors exactly 
two of these components. Therefore the search for the nearest boundary points is 
restricted to the interior of these two components adjoining the axis. Admittedly, 
the boundary points found in this way depend on where the terminal point of the 
protrusion axis is which in turn depends on the choice of p. However, if the ends 
of protrusion are marked by a sharp change in the local width of the shape, the 
boundary points are insensitive to p. In the special cases when this is not true 
as in the case of a parallelogram, the larger the value of p, the further away the 
boundary points from the obtuse corners which are interpreted as protrusions. 

It is possible to segment shapes across necks by means of the associated 
saddlepoints. This is a more delicate construction. Here the problem is to avoid 
spurious saddlepoints arising from the numerical break-up of parabolic lines. 
The difficulty is that there may be irrelevant small segments of D\S\ adjoining 
the saddlepoint. Therefore, a hyperbolic point is called a true saddlepoint if it 
adjoins at least three segments of D\Si which touch the shape boundary and 
if the saddlepoint is not a point on the boundary of D. (The last condition 
avoids saddlepoints artificially introduced by the frame D.) Going through each 
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Fig. 3. Left: > 0 dark, < 0 light. Right: Segmentation. 



saddlepoint is a medial axis, directed away from it, and the problem is to find 
the two nearest boundary points, one each side of the medial axis. Restrict 
the search for the nearest boundary points to only these adjoining segments 
of D\Si- Once the two boundary points are found, one on each side of the 
medial axis, connect the saddlepoint to each of them. This construction may 
still produce double segmentation lines. This happens if there are two branches 
of Si leaving the theoretical parabolic line from two different points to meet 
the shape boundary or another true saddlepoint. The solution in this case is to 
search in an appropriately chosen tubular neighhood of the medial axis through 
the saddlepoint and treat the two terminal points on the parabolic line as a 
single unit. 

Figure 3 on the right shows the segmentation of the star figure. The shape is 
segmented from inside as well as from outside. As noted before, in its attempt 
to extract the longest possible ribbon, the algorithm disregards the approximate 
symmetry of the star and includes in the main ribbon two of what would normally 
be perceived as protrusions. 

The segmentation has the structure of a directed graph. Its set of vertices 
consists of the true saddlepoints and one vertex for each of the shape segments. 
If a segment A is a protrusion with a medial axis originating at a point in J~^ 
which is contained in another segment Y, then FA is an edge in the graph. (The 
direction of an edge is always in the direction of increasing v.) Each segmentation 
line through a saddlepoint is the common boundary between two shape segments. 
The vertices correponding to these two segments are connected to the vertex 
corresponding to the saddlepoint by edges directed towards the saddlepoint. 

Additional examples of shape segmentation are shown in Fig. 4. All the quan- 
tities needed in the algorithm were computed using 3x3 neighborhoods except 
the sign of (P ||Vz;|| /ds^ which required 4x3 neighbor hooods. (Each shape was 
scaled to make sure that it was nowhere less than 4 pixels wide.) In Fig. 4, the 
top row on the left shows the loci while the bottom row shows the corre- 
sponding segmented shapes. The example of the brain segmentation shown on 
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Fig. 4. Left: top row: , bottom row: Segmentation. Right: Complex Shape. 



the right illustrates the case of a complex of shapes involving non-simply con- 
nected shapes and triple junctions. Note that the shape boundary is outlined by 

thick jagged lines while the segmentation lines are straight and thinner. 
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Abstract. A method to automatically select locally appropriate scales 
for feature detection, proposed by Lindeberg iHi, m, involves choosing 
a so-called 7 -parameter. The implications of the choice of 7 -parameter 
are studied and it is demonstrated that different values of 7 can lead to 
qualitatively different features being detected. As an example the range 
of 7 - values is determined such that a second derivative of Gaussian filter 
kernel detects ridges but not edges. Some results of this relatively simple 
ridge detector are shown for two-dimensional images. 

Keywords: Scale selection, ridge detection. 



1 Introduction 

The response of a local operator to an image depends on just how local the 
operator is. Figure (P) demonstrates this scale-dependence for an operator that 
computes the principal curvature of the image intensity. At small scales the 
edges of the diamonds produce a strong response, at larger scales the long axis 
of each diamond is particularly pronounced, and at still larger scales the rows of 
diamonds stand out. 

Scale-dependence has received much attention during the last two decades. 
On one hand it creates the problem of having to deal with all scales whenever 
there is no prior information available about the scales that occur in an image. 
On the other hand it creates the possibility to automatically determine the scales 
of structures in an image. 

In medical image processing, for example, several studies have demonstrated 
the possibility to determine both the position and the width (scale) of linelike 
structures such as blood vessels p, 0, HU], H2I. The selection of locally appro- 
priate scales along such structures not only yields local estimates of the width 
but it also allows to track structures whose width varies considerably. 

The method for automatic scale-selection employed in these studies was pro- 
posed by Lindeberg 0, IS! It requires choosing one parameter, called the 7 - 
parameter. In 0, 0, 0, PH the parameter was chosen in such a way that for 
some model structure the automatically selected scale exactly reproduced the 
known scale of the model. All these studies also report that “the most serious 
problem in the application of these [linear, second derivative of Gaussian] filters 
is their response to other features, such as edges or ‘sheets’ in 3D.” 0 page 40]. 
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The aim of this article is to demonstrate that the choice of 7 determines not 
only quantitatively the scales of detected features but also qualitatively the type 
of features that are detected. In the case of linelike structures feature detection 
with automatic scale selection can be accomplished with a simple second deriva- 
tive of Gaussian filter kernel. While at fixed scales this detector suffers from 
the abovementioned problem of also detecting edges, the appropriate choice of 
7 allows to avoid edge detection in the variable scale setting. 

The article is organized as follows. First the method for automatic scale- 
selection by Lindeberg 0 is briefly reviewed. Then the influence of the 7 - 
parameter on feature detection is studied in general. Next a short catalog of 
critical 7 -values for several one-dimensional model structures is created. This 
catalog suggests a range of 7 -values at which a second derivative of Gaussian op- 
erator detects ridges but not edges. This possibility is studied in detail in one di- 
mension. Finally the problem of detecting linelike structures in two-dimensional 
images is studied and examples are given. 




II 
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Fig. 1. Original image (left) and principal curvature computed at three different 
scales. The image has 512 by 512 pixels and the scales are -s/t = 8 , \/t = 16, 
Vt = 24 pixels. 



2 Scale-Selection Using 7 -Normalized Derivatives 



A method for automatic scale-selection that deals with positions and scales si- 
multaneously was proposed by Lindeberg in 1993 |^. It is a generalization of the 
idea to normalize the response of edge-detectors described earlier by Korn 0 . 
The method deals with derivative of Gaussian operators 



G"(x;t) 






e 



x^x 

2t 



(27rt)'^/2 



where denotes the n^-th order derivative along the z-th Gartesian coordinate, 
N is the dimension of the image, usually 2 or 3, and t is the scale-parameter. 
The response of these operators to an image / is computed by convolution and 
will be denoted as 



L„(x;t) = (G"(o;t) * /)(x) 
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It should be noted that the response of a derivative of Gaussian operator com- 
puted in Cartesian coordinates does not itself capture useful structural informa- 
tion about an image because the Cartesian coordinates are generally not related 
to image structures. Useful structural operators can be constructed from combi- 
nations of the response of several derivative of Gaussian operators as described 
in 0. 

In analogy to feature detection where local extrema of the operator response 
are computed with respect to space, one might wish to select scales in terms of 
local extrema of the operator response with respect to scale. Unfortunately, for 
derivative of Gaussian operators this makes little sense because the amplitude 
of a derivative tends to decrease with increasing scale as a simple consequence 
of the fact that with increasing scale the response is increasingly smoothed. 
This prompted Lindeberg to consider j -normalized derivatives 

t) = t) * /)(x) (1) 

where n = ni njq. The amplitude of ^-normalized derivatives is obvi- 

ously greater than that of regular derivatives when t > 1 and 7 > 0. Lindeberg 
proposed the following heuristic principle 0: 

In the absence of other evidence, a scale level at which some (possi- 
bly non-linear) combination of normalized derivatives assumes a local 
maximum can be treated as reflecting the characteristic length of a cor- 
responding structure in the data. 

In combination with feature detection this method of scale selection amounts 
to finding those (position,scale)-pairs (x, t) where the 7-normalized operator 
response has an extremum with respect to position and scale. 

The idea proved to be very useful and it has since been applied to detect 
blood vessels P, PH], and other structures whose size is of interest 0, [12!, 
0 ' 

3 Implications of Different 7 

To analyze the influence of 7 on scale selection consider the necessary conditions 
for x; t) to have a local extremum with respect to position and scale: 



0 = 5iL„(x;t) z=l,...,iV 

0 = ^t~^Ln{y^;t) -\- dtLn{x;t) ^ 

Figure (0) shows solutions to these equations for a second derivative of Gaussian 
operator applied to a one-dimensional image. The zero-crossings of the derivative 
with respect space are independent of 7. The zero-crossings of the derivative with 
respect to scale show a tendency to move to larger scales with increasing 7. The 
negative-most pit at a; « 40 for example is assigned the three different scales 
= 5, -\/t = 10, -s/t = 21 at the 7-values 7 = .5, 7 = 1.5, 7 = 2.5 respectively. 
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Fig. 2. Original one-dimensional image and zero-crossings of the response of 
the second derivative of Gaussian as functions of x and sigma= y/i for three 
different values of 7. The solutions of 0 = t) are shown in light gray, 

those of 0 = t) -I- 9tLn(x; t) in dark gray. 



This influence of 7 on the selected scales suggests that the value of 7 may 
be adjusted such that some “correct” scale is selected for a model feature whose 
scale is known or defined a priori. Most previous work on scale-selection moti- 
vates the choice of 7 in this way. To give an example consider the one-dimensional 

a-2 

model I\/2 'kw = G{x\ w) of width w. Suppose one attempts to detect such 
structures in one dimensional images and one chooses to do so with a second 
derivative of Gaussian operator. The response of the 7-normalized second deriva- 
tive of Gaussian to this model is t'’'G^(x; t + w) = — yi^)G(x; t + w). 

The maximum over scales at the center x = 0, which is the position of inter- 
est, occurs at t = w. To achieve a one to one correspondence between the 
selected scale and the width of the model 7 must be set to 7 = 3/4. 

A more far reaching consequence of the choice of 7- value is the following. For 
any specific image structure there is a certain range of 7-values within which 
the structure is assigned a finite scale by the 7-normalized operator used for 
detection. For values of 7 greater than some critical 7 the extrema with respect 
to scale are pushed to infinity so that above this critical 7 the structure cannot 
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be detected. In other words, the choice of 7 determines which structures can be 
detected and which not. 

The implications are equally evident and surprising: A feature detection op- 
erator applied at some fixed scale generally responds to a number of different 
structures, e.g. edges and ridges. At variable scales the 7-normalized operator 
responds only to a subset of the structures detected at fixed scales. The 7- 
parameter can be used to adjust this subset. For example, in the application 
considered in section 0 where ridges are of interest and edges not, 7 can be 
adjusted such that only ridges but not edges are detected. 

4 A Short Catalog of Critical 7 -Values 

In order to apply scale-selection to “turn off” specific features that are not of in- 
terest we briefly report the critical 7-values for a few one-dimensional model 
structures and derivative of Gaussian operators. A more detailed treatment 
might be given elsewhere. 

Step Edge. For all derivative of Gaussian operators 7 = 1 is the critical 7- value 
of a step edge 0, where 6{x) = 1 for a; < 0 and 9{x) = 0 for a: > 0. 

Gaussian Edge. For all derivative of Gaussian operators 7 = 1 is the critical 
7-value of a Gaussian edge g{x]w) = dx G^{x;w'^) of width w, which is 
simply a step edge smoothed with a Gaussian filter kernel. 

Lindeberg 0 determined this critical 7- value in the context of edge detection 
and suggested to use 7 = 1/2 in order to detect the edge at its “correct” scale 
t = w. Here we wish to emphasize the use of values 7 < 1 to enable detection of 
edges and values 7 > 1 to avoid detection of edges. 

Step Ridge. The response of a derivative of Gaussian operator G" to a step 
ridge r(x; w) of width w, defined by r(x; w) = 1 for —w < x < w, r{x; w) = 0 
otherwise, is the difference of two step edge responses. 

For a second derivative of Gaussian operator the critical 7-value is 7 = 
3/2. For n = 4 the critical 7-value is 7 = 5/4. One may conjecture that the 
critical 7-values of derivative of Gaussian operators applied to the step ridge are 
^ = n±l 

' n 

Gaussian Ridge. The response of a derivative of Gaussian operator G" to a 
Gaussian ridge G^{x; wf) of width w is simply G"(a;; t + w“^). The critical 7- value 
of a derivative of Gaussian operator G” applied to a Gaussian ridge is 7 = 

5 Ridges versus Edges 

The critical 7- values shall now be applied to demonstrate how a second derivative 
of Gaussian operator can be used to detect ridges but not edges. This allows to 
avoid detecting the edges that naturally occur on both sides of a ridge. 
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For a second derivative of Gaussian operator the critical 7-value of an 
edge is 7 = 1 and the critical 7-value of a ridge is 7 = 1.5. Consequently within 
the range 1 < 7 < 1.5 this operator detects only ridges and not edges. In other 
words, the operator responds to a ridge with only a single extremum over position 
and scale {x,t) and this extremum occurs at the center of the ridge. 

Figure © displays the response of a second derivative of Gaussian operator 
to a step ridge for different values of 7. The response GL^x is drawn as a surface. 
Below the surface zero-crossings of the first derivatives along space and scale are 
shown in order to aid the visual inspection of local extrema. 

It can be seen clearly that the maximum response corresponding to the center 
of the ridge is “pushed” towards larger scales with increasing 7, until at 7 = 1.5 
it disappears. Moreover, as expected, the maxima corresponding to the edges 
occur at small scales {t = 0) as long as 7 < 1 and disappear when 7 >= 1. 
Hence of the displayed 7-values the only value at which the ridge is detected 
and the edges are not is 7 = 1.25. 

y=0.0 Y=0.75 




Fig. 3. Response of second derivative ridge detector —GLxx to a step ridge 
model. Zero-crossings of the first derivative along space and scale are shown 
below each surface. With increasing 7 the central maximum is “pushed” to 
larger scales. 
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6 Linelike Structures in Two Dimensions 

Several methods for the detection of linelike structures in two or three dimen- 
sional images have been proposed in the context of fixed scales (see e.g.p, g|, 
and m) and still more differences exist when scales are to be selected automat- 
ically 0, 0, B3, 0, ca. In the following a brief description is given of a local 
approach that uses a second derivative of Gaussian operator and profits from 
the critical 7- values for edges and ridges described above. 

The detection of ridges in two-dimensional images proceeds in two steps. 
First at each point a set of orthogonal directions is chosen, one pointing along 
the hypothetical ridge and the other perpendicular to the ridge. Then, at each 
point, the image intensity is analyzed in the direction across the hypothetical 
ridge to see if the point is on a ridge or not. 

The direction along which a ridge extends is defined as the direction of max- 
imum second derivative and is denoted as q. The orthogonal axis of minimum 
second derivative traverses the ridge and is denoted by p. Within these local 
coordinates a ridge at fixed scales is defined as the set of points where Lpp is 
a local minimum along p and additionally the conditions Lpp < 0 as well as 
\Lpp\ > \Lqq\ are satisfied. In terms of zero-crossings the defining equations are: 

Lppp — 0 , Lpppp 7 0 , Lpp 0 , |Tpp| ^ |Tgg| (3) 

Figure shows the ridges of the diamond image computed from definition Q 
at three different scales. (For computational details refer to ^1] or P|.) 




Fig. 4. Original image (left) and ridges computed at three different scales. The 
image has 512 by 512 pixels and the chosen scale levels are i/t = 8, i/t = 16, 
^/t = 24. 



From figure ® it is clear that the results of fixed scale ridge computations 
can significantly depend on the choice of scale. Worse still, at scales that are 
relatively small compared to the size of the image structures the method detects 
edges instead of ridges. 

Previous studies of line detection with automatic scale selection have reported 
that edges also pose a problem in the variable scale context A number 

of approaches have been proposed to avoid these “false” responses. Koller 0 
suggests a nonlinear operator that combines the response of two edge-detectors 
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on both sides of a ridge. Lorenz et al. m use an edge-indicator to suppress the 
response to edges. Lindeberg H proposes to compute positions of ridges in terms 
of extrema of one operator and scales in terms of extrema of another operator. 

The considerations of the previous sections suggest that a “correct” choice 
of 7-parameter allows to use a simple second derivative of Gaussian operator in 
order to detect ridges without running into the problem of also detecting edges. 
In particular, if one defines a second derivative scale-space ridge to be the set 
of points where V Lpp has a local minimum along p as well as along the scale- 
dimension t and additionally the conditions Lpp < 0 as well as \Lpp\ > \Lqq\ are 
satisfied, then the results of section 0 can be applied directly. In other words a 
7-value of 7 = 1.25 allows to detect ridges and “escape” edges. 

Figures 0 and ijEI) display the second derivative scale-space ridges computed 
from the diamond image for 7 = 1.25. Figure (0) shows a projection of the results 
onto the image plain. Figure displays the ridges from two points of view that 
also show the scale axis. Evidently, edges are not detected even though the range 
of scales covers the value ^/t = 8 at which edges pose a problem in the fixed 
scale setting of figure 0 . 

Concerning ridges the variable scale approach finds both the ridges corre- 
sponding to the long axes of each diamond and the ridges corresponding to the 
rows of diamonds. Notably, although these ridges cross each other in the pro- 
jection of figure ©, they occur at separate scales as can be seen in figure ®. 
These merits of the variable scale approach are well known and not at the 
focus of attention here. In the present context figures P) and should only 
serve to illustrate that in the variable scale setting the response to edges of a 
second derivative of Gaussian operator can be “turned off” simply by a suitable 
choice of 7-parameter. 





Fig. 5. Second derivative scale-space ridges projected onto the image plain. The 
original image has 256 by 256 pixels. Ridges were computed for 7 = 1.25 and 
scales in the range 1 to 28 in steps of 1.0 (unit length=l pixel width/height). 
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Fig. 6. Second derivative scale-space ridges projected along different directions. 
The scale-axis is the one with tics and numerical values. 

7 Summary 

Given some 7-normalized operator to be used for detection there is for any 
specific image structure a certain range of 7- values within which the structure is 
assigned a finite scale. For values of 7 greater than some critical 7 the extrema of 
the operator response with respect to scale disappear so that above this critical 
7 the structure cannot be detected. In other words, the choice of 7 determines 
which structures can be detected and which not. 
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Abstract. Receptive field sensitivity profiles of visual front-end cells in 
the LGN and VI area in intact animals can be measured with increas- 
ing accuracy, both in the spatial and temporal domain. This urges the 
need for mathematical models. Scale-space theory, as a theory of (mul- 
tiscale) apertures as operators on observed data, is concerned with the 
mathematical modeling of front-end visual system behaviour. This paper 
compares recent measurements on the spatio-temporal respons of LGN 
cell and VI simple cell receptive fields with Koenderink’s results from 
axiomatic reasoning for a real-time measuring spatio-temporal differen- 
tial operator |2]. In this model time must be logarithmically remapped 
to make the operation causal in the temporal domain. 



1 Scale-Space Kernel Derivation from Entropy 
Maximization 

The Gaussian kernel as the fundamental linear scale-space kernel for an uncom- 
mitted observation is now well established. Many fundamental derivations have 
been proposed (see for an extensive and complete overview Weickert |3|). In this 
paper we present an alternative way to derive the Gaussian kernel as the scale- 
space kernel of an uncommitted observation. It is based on the notion that the 
’uncommittedness’ is expressed in a statistical way using the entropy of the ob- 
served signal. The reasoning is due to Mads Nielsen, IT-University Gopenhagen 

m- 

First of all, we want to do a measurement, i.e. we have a device which has 
some integration area with a finite width by necessity. The measurement should 
be done at all locations in the same way, i.e. with either a series of identical 
detectors, or the same detector measuring at all places: the measurement should 
be invariant for translation. We want the measurement to be linear in the signal 
to be measured (e.g. the intensity): invariance for translation along the intensity 
axis. These requirements lead automatically to the formulation that the obser- 
vation must be a convolution: h{x) = L{a)g{x — a)da. L{x) is the observed 
variable, e.g. the luminance, g{x) is the aperture function, h{x) the result of the 
measurement. 



M. Kerckhove (Ed.): Scale-Space 2001, LNCS 2106, pp. 255-^^3 2001. 
© Springer- Verlag and lEEE/CS 2001 



256 



Bart M. ter Haar Romeny, Luc M.J. Florack, and Mads Nielsen 



The aperture function g{x) should be a unity filter, i.e. normalized, which 
means that the integral over its weighting profile should be unity: g{x)dx = 

1. The mean of the filter g(x) should be at the location where we measure, e.g. 
at xo, so the expected value (or first moment) should be xq : xg(x)dx = xq- 

Because we may take any point for xg, we may take for our further calculations 
as well the point xq = 0. 

The size of the aperture is an essential element. We want to be free in choice 
of this size, so at least we want to find a family of filters where this size is a 
free parameter. We can then monitor the world at all these sizes by ’looking 
through’ the complete set of kernels simultaneously. We call this ’size’ a. It has 
the dimension of length, and is the yardstick of our measurement. We call it 
the inner scale. Every physical measurement has an inner scale. It can be gm 
or lightyears, we need for every dimension a yardstick: cr. If we weight distances 
r(x) with our kernel, so we get r(x)g(x)dx, we will use r(x) = x^ since with 
this choice we separate the dimensions: two orthogonal vectors fullfill r(a + 6) = 
r(a) + r(b). We call the weighted metric x^g{x)dx = 

Finally we incorporate the request to be as uncommitted as possible. We 
want no filter with some preference at this first stage of the observation. We 
want, in statistical terms, the ’orderlessness’ or disorder of the measurement as 
large as possible. There should be no ordering, ranking, structuring or what- 
soever. Physically the measure for disorder is expressed through the entropy 
H = ^'^^g{x)h\g{x)dx where In a; is the natural logarithm. We look for the 
g{x) for which the entropy is maximal, given the constraints derived before: 



g{x)dx = 0 , / x g{x)dx = 0 , / x^g{x)dx = < t ^ 



To find a maximum under a set of given constraints, we apply the method 
of Euler-Lagrange equations. The Lagrangian e becomes: 

The condition to be minimal for a certain g{x) is given by the vanishing of the 
first variation (corresponding to the first derivative, but in this case with respect 
to a function) to g{x): = 0. This gives us: -1-1- Ai +x \2 -l-x^Aa — In g{x) = 0 

from which we can easily solve g{x): 

g[x_]= y . Solve [Log [y] ==-l+Ai+x A 2 +x^A 3 ,y] // First 

g — l + Ai-|-a;A2-t-a:^A3 



So, g{x) is an exponential function with constant, linear and quadratic terms 
of X in the exponent. At least A3 must be negative, otherwise the function ex- 
plodes, which is physically unrealistic. We need the explicit expressions for our 
constraints, so we make the following set of constraint equations, simplified with 
the condition of A3 <0: 

{eqnl = Simplify [ g[x] dx ==1, A 3 < 0] , 

eqn2 = Simplify [ x g[x] dx ==0, A 3 < 0] , 

eqnS = Simplify [ x^ g[x] dx == , A 3 < 0 } 
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2(-A3)3/2 



^3 



== 0 , 



^ V^(A2-2A3) 
4(-A3)V2 



Now we can solve for all three A’s: 

Of f [Solve : : if un] ; solution = Solve [{eqnl ,eqn2, eqnS} , {Ai ,A 2 , As}] 
{{Ai ^ i Log[ A2^0, As ^-2^}} 

g[x_] = Simplify [E^^'*'^4+='A2 -I-x A3 j Flatten [solution] , <t > 0] 







which is the Gaussian function as the unique solution to the set of con- 
straints, which in principle are a formal statement of the uncommitment of the 
observation. 



2 Scale-Time 

In the time domain we encounter sampled data just as in the spatial domain. 
E.g. a movie is a series of frames, samples taken at regular intervals. In the spa- 
tial domain we need an integration over a spatial area, in the temporal domain 
we need to have an aperture in time integrating for some time to perform the 
measurement. This is the integration time. Systems with a short resp. long inte- 
gration time are said to have a fast resp. slow respons. The integration time by 
necessity needs to have a finite duration (temporal width) in time, a scale-space 
construct is a phyical necessity again. Furthermore, time and space are incom- 
mensurable dimensions, so we need a scale-space for space and a scale-space for 
time. 

Time measurements can essentially be processed in two ways: as pre-recorded 
frames or instances, or realtime. Temporal measurements stored for later replay 
or analysis, on whatever medium, fall in the first catagory. Humans perform 
continuously a temporal analysis with their senses, they measure real-time and 
are part of the second category. The scale-space treatment of these two categories 
will turn out to be essentially different. 

Prerecorded sequences can be analyzed in a manner completely analogous 
with the spatial treatment of scaled operators, we just interchange space with 
time. The notion of temporal scale (Jt then naturally emerges, which is the 
temporal resolution, a device property when we look at the recorded data (it is 
the inner scale of the data), and a free parameter in the multiscale analysis. 

In the real-time measurement and analysis of temporal data we have a se- 
rious problem: the time axis is only a half axis: the past. There is a sharp and 
unavoidable boundary on the time axis: the present moment. This means that 
we can no longer apply our standard Gaussian kernels, because they have an (in 
theory) infinite extent in both directions. There is no way to include the future 
in our kernel, it would be a strong violation of causality. But there may be a way 
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out when we derive from first principles a new kernel that fulfils the constraint 
of causality: a kernel defined on a logarithmically remapped time axis. From 
this new causal kernel we might again derive the temporal and spatio-temporal 
family of scaled derivative operators. Koenderink [2| has presented the reasoning 
to derive the theory, and we will discuss it in detail below. 

There have appeared some other fine papers discussing the real-time causal 
scale-space in detail by Florack and Lindeberg, Fagerstrom and Bretzner 
I . Lindeberg also discusses the automatic selection of temporal scale m- 



.DUFSl!) 



3 Causal Time-Scale Is Logarithmic 

For realtime systems the situation is completely different. We noted in the pre- 
vious section that we can only deal with the past, i.e. we only have the half 
time-axis. This is incompatible with the infinite extent of the Gaussian kernel 
to both sides. With Koenderink’s words: ’’Because the diffusion spreads influ- 
ences with infinite speed any blurring will immediately spread into the remote 
future thereby violating the principle of temporal causality. It is clear that the 
scale-space method can only lead to acceptable results over the complete axis, 
but never over a mere semi-axis. On the other hand the diffusion equation is the 
unique solution that respects causality in the resolution domain. Thus there can 
be no hope of finding an alternative. The dilemma is complete” j2j. 

The solution, proposed by Koenderink, is to remap (reparametrize) the half 
t-axis into a full axis. The question is then how this should be done. We follow 
here Koenderink’s original reasoning to come to the mapping function, and to 
derive the Gaussian derivative kernels on the new time axis. 

We call the remapping s(t). We define to the present moment, which can 
never be reached, for as soon as we try to measure it, it is already later. It 
is our absolutely defined referencepoint, our fiducial moment. Every realtime 
measurement is relative to this point in time. Then s should be a function of 
p, = to — t, so s{p) = s{to — t). We choose the parameter p to be dimensionless, 
and p = 0 for the present moment and p = —oo for the infinite past. So we 
get s{p) = The parameter r is some time constant and is essentially 

arbitrary. It is the scale of our measurement, and we should be able to give it 
any value, so we want the diffusion to be scale-invariant on the /r-domain. 

We want shift invariance on this time axis and the application of different 
clocks, so we require that a transformation t' = at + b leaves s(t) invariant, p is 
invariant if we change clocks. 

On our new time-axis s{t) the diffusion should be a normal causal diffusion. 
On every point of the s-axis we have the same amount of diffusion, i.e. the 
diffusion is homogeneous on the s-domain. The ’inner scale’ or resolution of our 
measurement has to become smaller and smaller when we want to approach 
the present moment. But even if we use femtosecond measuring devices, we will 
never catch the present moment. On the other side of the s-axis, a long time 
ago, we don’t want that high resolution. An event some centuries ago is placed 
with a resolution of say a year, and the moment that the dinasaurs disappeared 
from earth, say some 65 million years ago, is referred to with an accuracy of a 
million years or so. 
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This intuitive reasoning is an expression of the requirement that we want 
our time-resolution r on the s-axis to be proportional to i.e. t ~ /i or ^ = 
constant. So for small /i we have a small resolution, for large /r a large one. 

Normal causal diffusion on the s-axis means that the ’magnification’ ||^|| 
should be proportional to J. Then the s-axis is ’stretched’ for every /i in such a 
way that the scale (or ’diffusion length’ as Koenderink calls it) in the s-domain 
is a constant relative diffusion length in the ^-domain. Uniform sampling in the 
s-domain gives a graded resolution history in the t- or ^-domain. In formula : 
II ^11 ~ ^ or 1 1 1^1 1 = From this partial differential equation we derive that 
the mapping s(/r) must be logarithmic: s(^) = a\nfi + Ci. 

So our mapping for s is now: s = aln(^^^)-|- constant. The constant is an 
arbitrary translation, for which we defined to be invariant, so we choose this 
constant to be zero. We choose the arbitrary scaling parameter a to be unity, so 
we get: s = ln(^ 2 ^). 

This is a fundamental result. For a causal interpretation of the time axis we 
need to sample time in a logarithmic fashion. It means that the present moment 
is mapped to infinity, which conforms to our notion that we can never reach it. 
We can now freely diffuse on the s-axis, as we have a well defined scale at all 
moments on our transformed time axis. See figure E 




Fig. 1. The logarithmic mapping of the horizontal f-time half-axis onto the ver- 
tical s-time full axis. The present moment to, at t = 1 in this example (indicated 
by the vertical dashed line) can never be reached. The s-axis is now a full axis, 
and fully available for diffusion. The inset shows a typical working area for a 
realtime system. The response time delimits the area at the right, the lifetime 
at the left. Figure adapted from |^. 



In the s-domain we can now run the diffusion equation without violation of 
temporal causality. 

4 A Causal Semi-axis Is Logarithmic 

Florack ^ came to the same result from a different perspective, from abstract 
mathematics. He used a method from group theory. A group is formally defined 
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as a set of similar transformations, with a member that does the unity oper- 
ation (projects on itself, i.e. does nothing, e.g. rotation over zero degrees, an 
enlargement of 1, a translation of zero etc.), it must have an inverse (e.g. ro- 
tation clockwise, but also anti-clockwise) and one must be able to concatenate 
its members (e.g. a total rotation which consists of two separate rotations after 
each other). 

Florack studied the group properties of whole and half axes of real numbers. 
The group of summations is a group on the whole axis, which includes the 
positive and the negative numbers. This group however is not a group on the 
half axis. For we might be able to do a summation which has a result outside 
the allowed domain. The group of multiplications however is a group on the 
positive half axis. Two numbers multiplied from the half axis give a result on 
the same half axis. If we could make all multiplications into sums, we would 
have an operation that makes it a group again. The formal transformation from 
multiplications into sums is the logarithmic function: ln(a*5) = ln(a) -|-ln(6) and 
its inverse: = e“ * e^. The zero element is addition of zero, or multiplication 

with one. So the result is the same logarithmic function as the function of choice 
for the causal parametrization of the half axis. 

Lindeberg and Fagerstrom H derived the causal temporal differential op- 
erator from the non-creation of local extrema (zero-crossings) with increasing 
scale. 

Interestingly, we encounter more often a logarithmic parametrization of a 
half axis when the physics of observations is involved: 

— Light intensities are only defined for positive values, and form a half axis. It 
is well known e.g that the eye performs a logarithmic transformation on the 
intensity measured by its receptors on the retina. 

— Scale is only defined for positive values, and form a half axis (scale-space). 
The natural scalestep r on the scale-axis in scale-space is the logarithm of 
the diffusion scale a: t = ln((r) — In(cro). 



5 Real-Time Receptive Fields 



We have now all information to study the shape of the causal temporal derivative 
operators. The kernel in the transformed s-domain was given above. The kernel 
in the original temporal domain t becomes 



K{t,to]T) 



1 1 ln(Ia^)2 

V^T 



In figure 0 we see that the Gaussian kernel and its temporal derivatives are 
skewed, due to the logarithmic time axis remapping. It is clear that the present 
moment to can never be reached. The zerocrossing of the first order derivative 
(and thus the peak of the zeroth order kernel) is just at t = — r. 
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Fig. 2. Left, middle right: the zeroth, first and second Gaussian temporal deriva- 
tive operator in causal time. The timescale in each plot runs from the past, to 
the right. The temporal scale r = 200 ms, the right boundary is the present, 
to = 0. Note the pronounced skewness of the kernels. 



6 A Scale-Space Model for Time-Causal Spatio-Temporal 
Cortical Receptive Fields 



Precise measurements of the spatio-temporal properties of macaque monkey and 
cat LGN and cortical receptive fields are possible |1 1 11 2j n. They give support for 
the scale-time theory for causal time sampling. De Valois, Gottaris, Mahon, Elfar 
and Wilson PJ applied the method of reverse correlation and multiple receptive 
field mapping stimuli (m-sequence, maximum length white noise stimuli) to map 
numerous receptive fields with high spatial and temporal resolution. Some of the 
resulting receptive fields maps are shown in figure 0 




Fig. 3. Examples of spatio-temporal receptive field maps of a sample of VI 
simple cells of macaque monkey. Vertical axis in each plot: time axis from 0 
ms (bottom) to 200 ms (top). Horizontal axis per plot: space (in degrees), a: 
0-0.9, b: 0-1.2, c,d: 0-0.6, e: 0,1.9, f: 0-1.6 degrees. Note the skewed sensitivity 
profiles in the time direction, especially in subfigures e and f. Every ’island’ has 
opposite polarity to its neighboring ’island’ in each plot. Due to black and white 
reproduction the sign of the response could not be reproduced. The scale-space 
models for the plots are respectively: a: b: c: d: e: > f: 

Adapted from p. 



See also totoro.berkeley.edu/Demonstrations/VSOC/teaching/RF /LGN.html 
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If we plot the predicted sensitivity profiles according to Gaussian scale-space 
theory we get remarkably similar results. In figure 0 the space-time plots are 
shown for zeroth to second spatial and temporal differential order. Note the 
skewness in the temporal direction. 

Clear [gt ,gs ,n] ; t=0.3; cr = 2; 

gt[n_] = Exp[-^ Log[-^2;^]2] , {t,n}], 

gs[n_] = Exp[-^ xG, {t,n}]; 

Block [{$DisplayFunction=Identity} , 

p=Table [ContourPlot [Evaluate [gt [i] gs [j] ] , {x,-15, 15} , {t , . 01 , . 8} , 

PlotPoints->30] , {i ,0, 1} ) { j .0.2}] ; Show [GraphicsArray [Flatten [p] ] ; 




Fig. 4. Model for time-causal spatio-temporal receptive field sensitivity profiles 
from Gaussian scale-space theory. From left to right: a: L, b: c: d: 

§x§i’ dx‘^dt ■ Vertical axis: time. Horizontal axis: space. 



7 Conclusion 

The causal-time multiscale temporal differential operator model from Gaussian 
scale-space theory has not yet been tested against the wealth of currently avail- 
able receptive field measurement data. It may be an interesting experiment, to 
test the quantitative similarity, and to find the statistics of the applied spatial 
and temporal scales, as well as the distribution of the differential order. The 
Gaussian scale-space model is especially attractive because of its robust physi- 
cal underpinning by the principal of temporal causality, leading to the natural 
notion of the logarithmic mapping of the time axis in a realtime measurement 
(see also [El). 

The distributions of the locations of the different scales and the differential 
order has not been mapped yet on the detailed cortical orientation column with 
the pinwheel structure. Orientation has been clearly charted due to spectacular 
developments in optical dye high resolution recording techniques in awake ani- 
mals. Is the scale of the operator mapped along the spokes of the pinwheel? Is 
the central singularity in the repetitive pinwheel structure the largest scale? Is 
differential order coded in depth in the column? 

These are all new questions arising from a new model. The answer to these 
questions can be expected within a reasonable time, given the fast develop- 
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ments, both in high resolution recording techniques, and the increase in res- 
olution of non-invasive mapping techniques as high-held functional magnetic 
resonance imaging (fMRI) [14) . 

In summary: When a time sequence of data is available in stored form, we 
can apply the regular symmetric Gaussian derivative kernels as causal multiscale 
differential operators for temporal analysis, in complete analogy with the spatial 
case. When the measurement and analysis is realtime, we need a reparametriza- 
tion of the time axis in a logarithmic fashion. The resulting kernels are skewed 
towards the past. The present can be never reached, the new logarithmic axis 
guarantees full causality. The derivation is performed by the hrst principle of 
a scale of observation on the new time axis which is proportional to the time 
the event happened. This seems to ht well in the intuitive perception of time by 
humans. 

Recent physiological measurements of LGN cell receptive fields and cortical 
VI simple cell receptive fields suggest that the biological system may employ 
the temporal and spatiotemporal differential operators. Especially striking is 
the observed skewness in the temporal domain, giving support for the working 
of the biological cells as time-causal temporal differential operators. 
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Abstract. The Gaussian serves as Green’s function for the linear dif- 
fusion equation and as a source for intuitive understanding of the lin- 
ear diffusion process. In general, non-linear diffusion equations have no 
known closed form solutions and thereby no equally simple description. 
This article introduces a simple, intuitive description of these processes 
in terms of the Diffusion Echo. The Diffusion Echo offers intuitive visu- 
alisations for non-linear diffusion processes. 

In addition, the Diffusion Echo has potential for offering simple formu- 
lations for grouping problems. Furthermore, the Diffusion Echo can be 
considered a deep structure summary and thereby offers an alternative 
to multi-scale linking and flooding techniques. 



1 Introduction 

Linear scale-space mm is the canonical un-committed scale-space with ap- 
pealing theoretical properties. Among these are the existence of a Green’s func- 
tion for the PDE (partial differential equation) in terms of the Gaussian. Besides 
providing a closed form solution to the PDE, the Gaussian yields a clear, intuitive 
understanding of the local filtering process. 

Non-linear scale-spaces are appropriate for enhancement of desired features 
and for extraction of certain deep structure features (in for instance edge detec- 
tion 0 and segmentation). These diffusion schemes can typically be formulated 
as PDE’s where a diffusion tensor determines the non-linear nature 0. In gen- 
eral, these PDE’s have no known closed form solutions. This necessitates iterative 
numerical approximation schemes which offer less intuition. 

Section 13 contains a presentation of the diffusion schemes used in the article. 
The Diffusion Echo is introduced in sectionQ with examples of how the Diffusion 
Echo can be used for visualisation of the diffusion schemes. Finally, potential 
applications of the Diffusion Echo are presented: 

— Grouping of features, for instance used for segmentation (section 0). 

— As a deep structure summary that can serve as an alternative to multi-scale 
linking or flooding techniques (section 0) . 



M. Kerckhove (Ed.): Scale-Space 2001, LNCS 2106, pp. 264-^^3 2001. 
(g) Springer- Verlag and lEEE/CS 2001 
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2 Diffusion Schemes 

A number of diffusion schemes are explored. All schemes use a PDE to define a 
scale-space L{x;t), where x are spatial coordinates and t the scale parameter. 
The PDE’s have an image / as initial condition: L{x\ 0) = I{x). 

Linear diffusion em can be defined by: Lt{x\ t) — AL{x\ t), the heat diffu- 
sion equation. The Gaussian with standard deviation a = is Green’s function 
for the PDE. 

The isotropic non-linear Perona-Malik scheme preserves edges during the dif- 
fusion 13: Lt{x;t) = div{ p{\VL^\'^) VL ) where = 1/(1+ 

The regularisation parameter cr is due to |2|- The notation VLo- means the gra- 
dient evaluated at scale cr. The parameter A is a soft threshold for the gradient 
magnitude required to locally slow the diffusion and preserve an edge. Follow- 
ing the terminology of Weickert 0, the scheme is termed “isotropic” since the 
diffusivity function p is scalar-valued. 

2.1 Generalised Anisotropic Diffusion 

Weickert 0 defines the anisotropic non-linear diffusion equation: 

Lt{x;t) = div{ D{Jp{VL,)) VL ) 

The diffusion tensor D S C°° is assumed to be symmetric and 
uniform positive definite. The structure tensor Jp is evaluated at integration 
scale p (set to zero in the following), and the gradient VLo- at sampling scale a. 

Diffusion schemes can be defined in terms of the eigenvalues Ai and A 2 for 
the corresponding eigenvectors v\ || VLo-, V 2 + VL^ for the diffusion tensor D. 
A large number of diffusion schemes (including the previous) are generalized by 
the Generalized Anisotropic Non-linear scheme (GAN) where the diffusion 
tensor eigenvalues are defined: 

r 1 \NL,\ = 0 

Ai = w{m, A, IVL^I) 

A 2 = 0 + (1 — 0) Ai 

The scheme is “anisotropic” when the eigenvalues are not equal (then D can not 
simply be replaced by a scalar- valued function). The global parameter 9 deter- 
mines the degree of anisotropy (0 is isotropic diffusion and 1 is full anisotropic), 
A is the soft edge threshold, and m is the “aggressiveness” that the edges are 
preserved with. 

The GAN scheme has the following schemes as special cases: 

— Linear Gaussian diffusion is defined by A ^ 00 . 

— The regularised Perona-Malik scheme is approximated by 0 = 0 and m = 
0.75 (which implies Cm = 3.31488 P]). 

— Weickert’s Edge Enhancing diffusion (FED) is defined hy 9 = 1 and m = 4 
(which implies Cm = 3.31488 0). 



( 1 ) 

( 2 ) 
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2.2 Corner Enhancing Diffusion 

Near “edges” the Perona-Malik scheme slows diffusion in all directions. For image 
enhancement, FED is appropriate since diffusion is full along edges. 

However, the EED scheme tend to round corners due to the full diffusion 
along the edge. Therefore, while full anisotropic diffusion is desirable at edge- 
like structures, a diffusion scheme with a milder degree of anisotropy is desired 
at corners. A local steering of the degree of anisotropy therefore seems sensible. 

The following Corner Enhancing Diffusion scheme (CED) is similar to GAN 
but steers the anisotropy locally using a corner measure: the isophote curvature 
K times the gradient to a power k. 

Ai = w{mg,\g, \VL„\) 
e = w{nii,\i, \Ka\ \VL„\^) 

A2 = 0+(1-0)Ai (3) 

The Corner Enhancing scheme is similar to the CID scheme from |2|. 

3 Visualisations 

It takes a strong mathematician to get intuition about the differences between 
the diffusion schemes above. A standard way of illustrating the schemes is to 
visualise the local diffusion at key points in an image like in the following. 

The non-linear diffusion processes are implemented using iterative numerical 
approximation schemes. For each iteration a diffusion tensor is determined for 
each point in the scale-space image. This diffusion tensor can be visualized by 
an ellipse where the orientation and the size are determined by the eigenvectors 
and corresponding eigenvalues. 

In figure Q] EED is illustated like this. Isolated, the third image seems to 
offer an understanding of the intensions of the diffusion scheme. However, the 
illustrated diffusion tensors are deceiving since they evolve during the diffusion. 
Furthermore, they fail to capture the interaction with the surrounding area. 




Fig. 1. Visualisations of diffusion tensors for EED scheme. Left: test image 
(64x64 pixels, intensities 0-255, SN ratio 2.5). Right three images: the local 
diffusion tensors illustrated as ellipses at five points for three different itera- 
tions (t = 0.4, 20, 100). An explicit approximation scheme with a nonnegativity 
discretisation was used [S|. 
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3.1 The Diffusion Echo 

The Diffusion Echo is inspired by the Gaussian that defines the local filtering in 
linear diffusion. The equivalent is obtained for non-linear schemes in two steps: 

Diffusion Echo: Source 

For a fiducial point p construct an auxiliary image with the value 1 at the 
point p and zero otherwise: the discrete impulse function. 

For each iteration in a diffusion process for an image /, the values are com- 
puted by assigning each pixel a weighted average of a neighbourhood of pixels. 

The auxiliary image is treated with the same weighting as the image /. The 
result is a distribution that has recorded the fimc that propagates from the source 
pixel p. This is the Diffusion Echo source distribution and is denoted Sp{-). 
Diffusion Echo: Drain 

The Diffusion Echo drain distribution is the opposite of the source. For a 
point q, the value for the drain distribution at a given point p is defined in terms 
of the source at p. Specifically, the drain distribution Dg(-) is Dq{p) = Sp{q). 

Note that the drain for a point requires the sources for the entire image. 

The Diffusion Echo drain distribution is the local filter for the diffusion pro- 
cess equivalent to the Gaussian filter for the linear diffusion process. 

Diffusion Echo Properties 

For linear diffusion both the source and the drain distributions are Gaussian 
distributions. However, in general the distributions are not equal. 

The Diffusion Echo drain distribution is the local convolution filter for the 
diffusion process: L{x;t) = / L{q;0) D^iq) dq = J I{q) D^iq) dq 

The distributions can be interpreted as affinity measures. However, note that 
in general both Sp{q) Sq{p) and Dq{p) Dp(q). 

Since both source and drain are unity distributions they can also be inter- 
preted as probabilistic distributions. The source distribution Sp{q) (or the drain 
distribution Dq(j>)) states the probability for an “atom” originating at point p 
to end at point q as a result of the diffusion. 

The maximum for both source and drain distributions remain at the origin 
for the distribution. Mean and higher order moments are in general not located 
at the origin and can be used to characterise the distributions. 

The definitions are applicable for images of arbitrary dimensions. 



3.2 Diffusion Echo Visualisations 

The Diffusion Echo is a summary of the diffusion process up to a certain time/scale. 
In the following we show that illustrations using this principle offer significantly 
more information than the illustrations in the previous section. 



Basic Comparison. In figure 0we illustrate this for the four diffusion schemes 
presented in section 0 The figure displays the Diffusion Echo drain distributions 
for the five selected points in the test image from figure H These are equivalent 
to the local convolution filters that would yield the diffusion directly. 

The ellipses in figure |21 highlight the properties of the diffusion schemes. Lin- 
ear diffusion uses the same diffusion tensor at all points. The isotropic non-linear 
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Fig. 2. Diffusion Echo drain distributions. The top row shows ellipse-illustrations 
for the four diffusion schemes (linear, Perona-Malik, EED, and CED). The cen- 
ter row shows the corresponding drain distributions. Note that the distribution 
is computed separately at each point — the illustrations are mosaics of these 
separate illustrations. By definition, each distribution has the same total energy 
(they are unity filters), but they are scaled individually for better visual appear- 
ance. The bottom row shows close-ups of the distributions for the point right 
below the center of the triangle. 

An explicit approximation scheme with a nonnegativity discretisation was used 
with t = 20 jOj. The edge threshold parameter is set to 220 for all schemes. This 
corresponds to characterising the contour around the triangle as edge and the 
contour around the rectangle as non-edge. The regularisation scale is 1.2. 



Perona-Malik reduces the diffusion gradually determined by the gradient mag- 
nitude compared to a soft threshold value. The anisotropic EED scheme reduces 
diffusion across edges quite agressively but maintains full diffusion along the 
edges. Finally, the CED scheme reduces diffusion perpendicular to the gradient 
as well at corner-like structures. However, the Diffusion Echo distributions reveal 
that differences between the schemes are not quite as characteristic. Apparently, 
there is more like a smooth transitition between the schemes — like the existence 
of the GAN scheme implies. Even though the Perona-Malik scheme is isotropic it 
has a preferred diffusion direction along the edge. This is much more pronounced 
for the anisotropic EED scheme but not qualitatively different. 
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Effects of Discretisation Scheme. Apart from the relative differences be- 
tween the schemes, it appears that the Edge Enhancing scheme is not quite able 
to enhance the straight edges as well as in previous publications 0- This is 
simply because of the numerical approximation scheme. For the previous illus- 
trations we used the implementation that ensures non-negative weights in the 
local diffusion stencil (0 page 95). This restricts the spectral condition number 
of the diffusion tensor to be below 5.8284 — meaning that the local degree of 
anisotropy is limited. The eigenvectors are correspondingly limited such that 
Ai < 5.8284 A 2 . For the Edge Enhancing scheme where Ai = 1 this sets a lower 
limit on A 2 and thereby some diffusion across the edges is allowed. 

In figure 01 the same diffusion processes have been repeated using the non- 
restricted, standard approximation scheme |9I8| . It is apparent that a more ef- 
fective preservation of the edges is possible. The shapes of the distributions are 
especially interesting at the corners. The different abilities of the schemes with 
respect to supporting diffusion along the edge through the corner is evident. 

The illustration clearly reveals that the change of discretisation scheme has 
a major effect on the diffusion for some of the schemes. 







Fig. 3. Diffusion Echo drain distributions for the four diffusion schemes (linear, 
Perona-Malik, Edge Enhancing, and Corner Enhancing) where the nonnegativity 
approximation scheme used in figure 0 is replaced with the simpler standard ap- 
proximation scheme. The standard scheme allows more pronounced anisotropy. 
The bottom row is here close-ups for the point inside the left corner of the 
triangle. 
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The Visualisation Ability. The previous illustrations show that the Diffu- 
sion Echo is able to visualize properties of the diffusion schemes that are not 
otherwise apparent. Extended experiments with the diffusion schemes allows 
similar intuition but the Diffusion Echo illustrations offer this understanding 
directly. 

In the following sections we offer a few appetisers indicating that the Diffusion 
Echo can be used for more than just illustrations. 



4 Diffusion Echo Application: Grouping 

The Diffusion Echo expresses affinity between points in an image. This affinity 
measure can be used for grouping of pixels into regions or grouping of feature 
points in general. 

Since the affinity measure is defined in terms of the underlying diffusion, the 
diffusion scheme needs to be appropriate for the specific grouping task. 

Edge detectors often produce edge pieces with small gaps in-between. The 
Diffusion Echo for the Edge Enhancing diffusion scheme would be a very appro- 
priate measure for determining the connectivity of the edge pieces. 

Attributes of the Diffusion Echo can be used for determining grouping as 
well. An example could be using the shape of the distribution to guide grouping 
of pixels into regions. This is illustrated by the distributions in figure 01 

The differences in diffusivity from the low diffusivity near-edge areas to the 
high diffusivity areas away from the edges causes a flux away from the edges. 
Thereby the means of the distributions move away from their origins in a direc- 
tion away from the edges (if any edges are within “striking distance” depending 
on diffusion parameters, especially regularisation and diffusion time). This can 
be used to create a drift field that can be used to group the pixels. The field will 
create a sink in each region that the pixels drift towards. 



5 Diffusion Echo Application: Deep Structure Summary 

Instead of grouping the pixels individually, the Diffusion Echo can also be used 
for grouping image regions. This could be an alternative to existing multi-scale 
linking schemes. An example of this is shown in figure 0 where grouping based 
on the Diffusion Echo is compared to multi-scale watershed segmentation. 

The Diffusion Echo grouping uses a simple threshold to determine whether 
two neighbouring regions are merged into one region. This threshold is compared 
to the average affinity measure between pairs of pixels in the two regions using 
the affinity measure directly from the Diffusion Echo source distribution. The 
result is a simple flooding-like algorithm. 

Even though both examples are quite simple, they illustrate that the Diffu- 
sion Echo distributions capture what can be considered a deep structure sum- 
mary to such an extent that even simple attributes offer powerful grouping abil- 
ities. 
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Fig. 4. Diffusion Echo grouping drift fields. For the five points a vector from 
the point to the mean for the drain distribution gives a drift field that can be 
used for grouping. For visualisation purposes, the vectors have been scaled to 10 
times their actual lengths. The vectors from the points just inside the corners 
aim towards the center of the triangle. The vector for the point just below the 
triangle (it is a single pixel outside) aims away from the triangle. The remaining 
vectors are practically zero- vectors since the parameters for the Edge Enhancing 
scheme dictate that there are no edges near them — their distributions are 
approximately Gaussian with means at their origins. 



6 Conclusion 

We introduce the Diffusion Echo: the distributions that equip non-linear diffusion 
schemes with what corresponds to the Gaussian for linear diffusion. 

The Diffusion Echo offers illustrations of non-linear diffusion schemes that 
reveal the true local diffusion in an intuitive manner. 

Furthermore, we argue that the affinity nature of the distributions can be 
used effectly for grouping. This is demonstrated through two examples. First, the 
Diffusion Echo distributions are used to generate a drift field, where each pixel is 
equipped with a vector stating the preferred direction of grouping. Secondly, we 
group regions by the average affinity between pixels in the regions. This proves 
to be a simple but effective region grouping scheme. 

Gomputation time and memory requirement for the computation of the Dif- 
fusion Echoes are quadratic in the number of image pixels — this obviously is 
problematic for larger images. However, this article is only an introduction of the 
principles. Future applications will not use the basic definition directly. Where 
linear attributes of the Diffusion Echo distributions are needed, these can be 
computed directly during the diffusion iterations (in linear time and memory). 

Diffusion Echo based methods imply a shift from the diffusion scheme being 
an underlying information-simplifying step to being the central information- 
collecting process. The basic principle behind Diffusion Echo based methods is 
the diffusion knows. Future work will reveal the application tasks where the 
proper non-linear diffusion scheme really is omnipotent. 

Acknowledgements: Part of the implementations used for this article are 
based on code originally developed by Joachim Weickert, University of Mann- 
heim, and Ole Fogh Olsen, The IT University of Gopenhagen. 
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Fig. 5. Segmentation by grouping of watershed regions using the Diffusion Echo. 
Top row shows a simple example where the segmentation task is to capture the 
rectangle. First the test image followed by a watershed segmentation at low scale. 
Third image shows the multi-scale linking of the regions where the underlying 
diffusion is 30 levels of the edge enhancing scheme from t = 1 to t = 600 |3|. 
The rightmost image is the result of flooding with the threshold 0.0025 using 
the average Diffusion Echo source affinity between neighbouring regions. 
Bottom row is equivalent where the segmentation task is to capture the ventricles 
from a data set from the Internet Brain Segmentation Repository Here, the 
linking uses 10 levels from t = 0.6 to t = 80 in 10 levels. The hooding threshold 
is 0.0059. 

The Diffusion Echo hooding method groups the desired regions for both images. 

References 

1. The internet brain segmentation repository, 1999. MR brain data set 788_6_m and 
its manual segmentation was provided by the Center for Morphometric Analysis 
at MGH, http://neuro-www.mgh.harvard.edu/cma/ibsr. 

2. F. Catte, P.-L. Lions, J.-M. Morel, and T. Coll. Image selective smoothing and 
edge detection by nonlinear diffusion. SIAM J. of Num. An., 29:182-193, 1992. 

3. Erik Dam. Evaluation of diffusion schemes for watershed segmentation. Mas- 
ter’s thesis. University of Copenhagen, 2000. Technical report 2000/1 on 
http : //www. diku.dk/research/techreports/2000 .htm 

4. Erik Dam and Mads Nielsen. Non-linear diffusion for interactive multi-scale wa- 
tershed segmentation. MICCAI 2000, vol 1935 of LNCS, 216-225. Springer, 2000. 

5. Jan J. Koenderink. The structure of images. Biol. Cybem., 50:363-370, 1984. 

6. Tony Lindeberg. Scale-Space Theory in Computer Vision. Kluwer, 1994. 

7. Pietro Perona and Jitendra Malik. Scale-space and edge detection using anisotropic 
diffusion. IEEE PAMI, 12(7) :629 - 639, July 1990. 

8. H. Scharr and J. Weickert. An anisotropic diffusion algorithm with optimized 
rotation invariance. Mustererkennung 2000, DAGM, pp 460-467. Springer, 2000. 

9. Joachim Weickert. Anisotropic Diffusion in Image Processing. Teubner, 1998. 

10. Andrew P. Witkin. Scale-space filtering. In Proceedings of International Joint 
Conference on Artificial Intelligence, pages 1019-1022, Karlsruhe, Germany, 1983. 




Bilateral Filtering and Anisotropic Diffusion: 
Towards a Unified Viewpoint 



Danny Barash 

Hewlett-Packard Laboratories Israel, Technion City 
Haifa 32000, Israel 



Abstract. Bilateral filtering has recently been proposed as a nonitera- 
tive alternative to anisotropic diffusion. In both these approaches, images 
are smoothed while edges are preserved. Unlike anisotropic diffusion, bi- 
lateral filtering does not involve the solution of partial differential equa- 
tions and can be implemented in a single iteration. Despite the differ- 
ence in implementation, both methods are designed to prevent averaging 
across edges while smoothing an image. Their similarity suggests they 
can somehow be linked. Using a generalized representation for the in- 
tensity, we show that both can be related to adaptive smoothing. As a 
consequence, bilateral filtering can be applied to denoise and coherence- 
enhance degraded images with approaches similar to anisotropic diffu- 
sion. 



1 Introduction 

In a wide variety of applications, it is necessary to smooth an image while pre- 
serving its edges. Simple smoothing operations such as low-pass filtering, which 
does not take into account intensity variations within an image, tend to blur 
edges. Anisotropic diffusion 0 was proposed as a general approach to accom- 
plish edge-preserving smoothing. This approach has grown to become a well- 
established tool in early vision. 

This paper examines the relation between bilateral filtering, a recent ap- 
proach proposed in [Zj, and anisotropic diffusion. The paper is divided as fol- 
lows. Section II presents the connection between anisotropic diffusion and adap- 
tive smoothing. The goal is to suggest a viewpoint in which adaptive smoothing 
serves as the link between bilateral filtering and anisotropic diffusion. In Section 
III, adaptive smoothing is extended, which results in bilateral filtering. The pos- 
sible unification of bilateral filtering and anisotropic diffusion is then discussed. 
Sections IV and V take advantage of the resultant link, borrowing the use of 
the geometric interpretation to anisotropic diffusion and applying it in bilateral 
filtering. Section IV examines the convolution kernel of a bilateral filter, from 
the standpoint that color images are 2D surfaces embedded in 5D (x,y,R,G,B) 
space. In Section V, conclusions are drawn and suggestions are given for future 
examination of the proposed unified viewpoint. 
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2 Anisotropic Diffusion and Adaptive Smoothing 



We first examine the connection between anisotropic diffusion and adaptive 
smoothing, which was outlined in |S|. Given an image (a;), where x = (a;i, X 2 ) 
denotes space coordinates, an iteration of adaptive smoothing yields: 



j(*+i)(a;) = 



Et-i Et'-i + b ^2 + 



where the convolution mask is defined as: 

|d(‘)(a;i, 0:2)1^ 



w^*^\xi,X 2 ) = exp (— 1 



2fc2 



-) 



( 1 ) 



( 2 ) 



where k is the variance of the Gaussian mask. In 0, d^^Hxi , X 2 ) is chosen to 
depend on the magnitude of the gradient computed in a 3 x 3 window: 



d^^\x,,X2) = ^Gl^+Gl^ 



where. 



^ ^ ( dI^^\xi,X2) dI^^\xi,X2) 

^ dxi ’ 3x2 



(3) 



(4) 



noting the similarity of the convolution mask with the diffusion coefficient in 
anisotropic diffusion 0 ], 0 . 



It was shown [51 that equation (1) is an implementation of anisotropic dif- 
fusion. Briefly sketched, lets consider the case of a one-dimensional signal /‘(x) 
and reformulate the averaging process as follows: 

P^^{x) = CiP{x - 1) -I- C2l*{x) + Csl^ix + 1) (5) 

with 

Cl -I- C2 -I- C3 = 1 (6) 

Therefore, it is possible to write the above iteration scheme as follows: 

I^+\x) - I\x) = Ci(/‘(x - 1) - I\x)) + C3(/‘(x + 1) - I\x)) (7) 

Taking ci = C3, this reduces to: 

P+^{x) - I\x) = Ci{l\x - 1) - 2l\x) + I\x + 1) (8) 



which is a discrete approximation of the linear diffusion equation: 

31 n 

^ ^ 



(9) 



However, when the weights are space-dependent, one should write the weighted 
averaging scheme as follows: 



= c*{x — l)I*{x — 1) -I- c*{x)I*{x) + C*{x + l)I*{x + 1) 



( 10 ) 
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with 

c‘(a; — 1) + c*(a;) + c‘(a; + 1) = 1 (11) 

This can be rearranged as: 

P+^{x) - I\x) = c\x - l){l\x - 1) (12) 

- I\x)) + c\x + l){l\x + 1) - P{x)) 
or 

P+^{x) - I\x) = c\x + l){l\x + 1) (13) 

- I\x)) - c\x - l){l\x) - I\x - 1)) 

which is an implementation of anisotropic diffusion, proposed by Perona and 
Malik 

dl 

— = \/{c{x,,X 2 )^I) (14) 

where c(a;i,X 2 ) is the nonlinear diffusion coefficient, typically taken as: 

c(a;i,a; 2 ) = g(||V/(a:i,a; 2 )||) (15) 

where ||V/|| is the gradient magnitude, and (jf(||V/||) is an “edge-stopping” 
function. This function is chosen to satisfy g{x) 0 when x —> oo so that the 
diffusion is stopped across edges. 

Thus, a link between anisotropic diffusion (14) and adaptive smoothing (1) 
is established. In the next section, we show the link between adaptive smoothing 
and bilateral filtering. 



3 Bilateral Filtering and Adaptive Smoothing 

Bilateral filtering was introduced [Zj as a nonlinear filter which combines domain 
and range filtering. Given an input image f{x), using a continuous representation 
notation as in [Z|, the output image h{x) is obtained by: 

ir^r^c{tx)s{m),f{x))d^ ^ > 

where x = (xi,X 2 ),^ = {^ 1 ,^ 2 ) are space variables and f = {fR,fG,fB) is the 
intensity. The full vector notation is used in order to avoid confusion in what 
follows. The convolution mask is the product of the functions c and s, which 
represent ‘closeness’ (in the domain) and ‘similarity’ (in the range), respectively. 



Effectively, we claim that a discrete version of bilateral filtering can be writ- 
ten as follows (using the same notation as in the previous section, only I is now 
a 3-element vector which describes color images): 






,(t) 






+s 



j(‘+i)(a:) 



(17) 
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with the weights given by: 






(a;,^) = exp( 



2al 



)exp( 






24 



) 



(18) 



where S is the window size of the filter, which is a generalization of (1). In order 
to prove our claim and demonstrate the relation to (1), we use a generalized 
representation for the intensity I. In principle, the first element corresponds to 
the range and the second element corresponds to the domain of the bilateral 
filter. Defining the generalized intensity as: 



1 = 



I{x) X 1 

’ CTd J 



(19) 



we now take (x) to be the difference between generalized intensities at two 



points in a given S x S window. 



m - i{x) 



, the latter being a global extension 



to (3). In (3), the gradient, being the local difference between two neighboring 
points in a 3 X 3 window, was taken as a distance measure. Starting from (2), 
and setting k = 1 since the variances <jd and an are already included in the 
generalized intensity, we obtain: 



w(*)(a:) = exp(-^ /(I) - /(a;) ) = 



1 



= exp(-i 



= exp(-; 



I CTfl ’ (T£) / 

I{^)-I{x) ^-x 






0-D 



(^R CTD 
2 

) 



= exp(-; 



1 f{IiO-I{x)r , i^-xf 



' D 






2al 



24 



( 20 ) 



Because these are the weights used in the bilateral filter, as can be verified 
in (18), equation (20) provides a direct link between adaptive smoothing and 
bilateral filtering. In a general framework of adaptive smoothing, one can take 
spatial and spectral distance measures along with increasing the window size, 
abandoning the need to perform several iterations. Taken as such, we get the 
bilateral filtering implementation of |3 which can be viewed as a generalization 
of adaptive smoothing. 



4 Geometric Interpretation 

In the previous two sections, it was shown that anisotropic diffusion and bilat- 
eral filtering can be linked through adaptive smoothing. Specifically, the diffusion 
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coefficient in (14) relates to the convolution mask and in particular to the dis- 
tance measure which is used in the bilateral filter. Similarly, the relation between 
anisotropic diffusion and robust statistics was described in p. 

For illustration, Figure 2 demonstrates two different ways of performing edge- 
preserving smoothing on the original image in Figure 1. The result of using 
nonlinear diffusion filtering and the result of bilateral filtering is similar but 
not identical, since the parameters are different and it was intentionally chosen 
to use a large window size with the bilateral filter and several iterations with 
anisotropic diffusion. That is the most natural setup for the two to be used. 




Fig. 1. Original Image: Laplace. 



In color images, it was demonstrated in P that the image can be represented 
as 2D surface embedded in the 5D spatial-color space and denoising can be 
achieved by using the Beltrami flow. Related ideas can be found in p]], |2j. It is 
possible to borrow this notion outlined in P and choose the following spectral 
distance measure for the bilateral filter: 



\I{x) - m\ = ^{ARY + {AGY + {ABY (21) 

Note that only the spectral distance measure of the range part is given in (21) 
and can directly be installed in the similarity function s of the bilateral filter as 
implemented in P . The spatial distance measure of the domain part remains the 
same as with grey- level images. Written that way, one can distinguish between 
closeness in the domain and similarity in the range, with the advantage of treat- 
ing the two separately. However, it is also possible to write (21) equivalently by 
combining the spatial and spectral distance terms. Using the generalized inten- 
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Fig. 2. Edge-preserving smoothing: Anisotropic diffusion with 20 time-steps of 
r = 1 .0 (left) and Gaussian bilateral filtering with a 30 x 30 window size, ao = 5.0 
and (Jr = 30.0 (right), au and gr are bilateral filtering parameters, see [6] for 
details. 



sity defined in (19), the full distance measure can be written as: 



d^*\xi,X2) = 

(Axi)^ + (Ax2) 



' + /32((Ai?)2 + (AG)2 + (AB)2) 



( 22 ) 



where (3 = gbIgr. Note that this distance measure can be plugged into the 
convolution mask of adaptive smoothing (2) as one term with k = gd- It is now 
possible to take advantage of a geometric interpretation in which color images 
are 2D surfaces embedded in the 5D {x, y, R, G, B) space. Equation (22) is then 
analogous to the local measure: 

ds^ = dx^ + dy^ + + dG^ + dB^) (23) 



which is the geometric arclength in the hybrid spatial-color space discussed 
in PI, 0. 



5 Conclusions 

The nature of bilateral filtering resembles that of anisotropic diffusion. It is 
therefore suggested the two are related and a unified viewpoint can reveal the 
similarities and differences between the two approaches. Once such an under- 
standing is reached, it is possible to choose the desired ingredients which are 
common to the two frameworks along with the implementation method. The 
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method can be either applying a nonlinear filter or solving a partial-differential 
equation. 

Adaptive smoothing serves as a link between the two approaches, each of 
which can be viewed as a generalization of the former. In anisotropic diffusion, 
the diffusion coefficient can be generalized to become a ‘structure tensor’ P) 
which then leads to phenomena such as edge-enhancing and coherence-enhancing 
diffusions. In bilateral filtering, the kernel (which plays the same role as the diffu- 
sion coefficient) is extended to become globally dependent on intensity, whereas 
a gradient can only yield local dependency among neighboring pixels. Thus, the 
window of the filter becomes much bigger in size than the one used in adaptive 
smoothing and there is no need to perform several iterations. We note that this 
extension is general on its own right, meaning that a variety of yet unexplored 
possibilities exist for constructing a kernel with an optimal window size, as well 
as designing the best closeness and similarity functions for a given application. 

The general hybrid spatial-color formulation provide a geometric inter- 

pretation with which the bilateral convolution kernel can be viewed as an approx- 
imation to the geometric arclength in the 5D hybrid spatial-color space. Ideas 
that are based on the geometric interpretation, such as coherence-enhancement, 
can be borrowed from anisotropic diffusion and applied to some degree of ap- 
proximation in bilateral filtering. 

Two practical goals seem to come up from comparing between anisotropic 
diffusion and bilateral filtering. The first is a further trial to reduce the number 
of iterations needed in anisotropic diffusion (which can be achieved by efficient 
numerical schemes such as 0, less proned to stability problems) while retaining 
the same accuracy as in bilateral filtering. The second is to reduce the window 
size and investigate other means which aim at minimizing computations associ- 
ated with bilateral filtering. Both approaches are related to each other, and an 
exchange of new ideas between one another can be rewarding. 
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Abstract. Efficient numerical schemes for nonlinear diffusion filtering 
based on additive operator splitting (AOS) were introduced in 1101 . AOS 
schemes are efficient and unconditionally stable, yet their accuracy is low. 
Future applications of nonlinear diffusion hltering may require additional 
accuracy at the expense of a relatively modest cost in computations and 
complexity. 

To investigate the effect of higher accuracy schemes, we first examine the 
Crank-Nicolson and DuFort-Frankel second-order schemes in one dimen- 
sion. We then extend the AOS schemes to take advantage of the higher 
accuracy that is achieved in one dimension, by using symmetric mul- 
tiplicative splittings. Quantitative comparisons are performed for small 
and large time steps, as well as visual examination of images to find out 
whether the improvement in accuracy is noticeable. 



1 Introduction 

There are various applications of nonlinear diffusion filtering |f)l9| in image pro- 
cessing. Such ‘filters’ can be used for denoising, gap completion and computer 
aided quality control among many other tasks. These kind of applications de- 
mand high processing capabilities. The balance between high accuracy and com- 
putational efficiency is therefore an important issue in the design of such filters, 
that is expected to play an increasing role in future applications. 

In this paper, an accurate numerical scheme is proposed which is an extension 
to Weickert-Romeny-Viergever’s additive operator splitting (AOS) schemes [1 1)] . 
These schemes are efficient and reliable, in the sense that they permit the use 
of larger time steps, whereas the straight-forward explicit schemes, that were 
proposed originally in Perona and Malik’s classical paper j0|, are restricted to 
small time steps in order to ensure stability. However, the AOS schemes are 
limited in their accuracy to first order in time even for the linear case. We there- 
fore examine the possibility of increasing the accuracy in one- dimension, along 
with preserving this increase in accuracy by a suitable split-operator scheme. 
Our approach closely resembles the use of alternating direction implicit (ADI) 
type schemes |S|, which are second order in time for the linear case. We show 
that for large time steps the gain in accuracy can be visualized. These ADI-like 
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schemes can be applied in certain cases with a single iteration, effectively a large 
time-step, or very few number of iterations in order to better approximate many 
iterations with smaller time-steps. 



2 Nonlinear Diffusion Filtering 



Let us first provide a model for nonlinear diffusion in image filtering. We briefly 
describe the filter proposed by Catte, Lions, Morel and Coll PJ. The CLMC filter 
is a version of the Perona and Malik model 0 for image selective smoothing that 
was used in HOI as a benchmark for studying various numerical schemes. The 
basic equation which governs nonlinear diffusion filtering is 

f)n 

- = V-(g(|Vzi.nVri), (1) 

where u{x,t) is a filtered version of the original image. The original image f{x) 
is given as the initial condition 

u{x,0) = f{x), (2) 



and reflecting boundary conditions are used 

^ = 0 on on, (3) 

on 

where n is the normal to the image boundary d(7. 

The goal of selective smoothing in edge-preserving applications is to reduce 
smoothing across edges. In order to achieve this goal, the diffusivity g is chosen 
as a rapidly decreasing function of the gradient magnitude (edge indicator). 
Specifically, the following form for the diffusivity is suggested in the CLMC 
filter 



1 (s < 0) 

1 - exp (s > 0)> 



(4) 



where A = 10.0 throughout this paper. In addition, CLMC suggest at each 
time step a presmoothing mechanism, in which the image u is convolved with a 
Gaussian of standard deviation a to obtain u^- This can be achieved by solving 
the linear diffusion filtering (g = 1) 



^=V-(Vzi.), (5) 

for a very small time step of size T = cr^/2. This step is called regularization, or 
presmoothing, and can be approximated by any of the splitting schemes that will 
be mentioned in the paper. For example, a simple locally one-dimensional (LOD) 
scheme is a convenient choice. In the remaining of this paper, cr = 0.25 is chosen 
for the presmoothing, except when quantitative comparisons are performed and 
presmoothing is excluded. 
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3 One-Dimensional Schemes 

This section relies on where the one-dimensional explicit and semi-implicit 
schemes were described. Here we add the Crank-Nicolson and the DuFort-Frankel, 
as possible schemes to perform nonlinear diffusion in one-dimension. For more 
details and theoretical considerations regarding the framework for discrete non- 
linear diffusion scale-spaces, the reader is referred to IHTTII . 

Both the explicit scheme and the semi-implicit scheme are first order in time. 
A scheme which is a combination of the two and is second order in time for the 
linear case is the Crank-Nicolson scheme 

= ( 6 ) 

Another candidate scheme to try and achieve higher accuracy is the DuFort- 
Frankel method (2|. However, its inconsistency results (see Figure 2) in a scheme 
that is not reliable from a certain time step onwards. Nevertheless, an extended 
DuFort-Frankel in higher dimensions that averages the fluctuations at higher 
time steps might perform well in the anisotropic cases like the Beltrami frame- 
work P], or coherence enhancement jOj. Experimental results with all schemes 
are shown in Figures 1,2, 3,4, in which a ID cross-section of a natural image was 
taken (Figure 1) and edge-preserving smoothing was applied using small and 
large time steps. It is seen in Figure 1 (right) that the Forward-Euler scheme 
becomes unstable for larger time steps. Reducing the time-step by two orders 
of magnitude can recover an edge-preserved smoothed signal (Figure 1 middle), 
but this is inefficient. We are left with Backward-Euler and Crank-Nicolson for 
obtaining a robust nonlinear diffused signal at large time steps. An I 2 norm error 
comparison between the different output signals (see Section 4 on how this is 
calculated for images) reveals that in the nonlinear case, the ID Crank-Nicolson 
scheme as is remains first-order accurate in time. This is because the nonlin- 
ear diffusivity term, calculated at a specific time step, interferes with achieving 
higher order accuracy in time. In order to retain second-order accuracy, extrap- 
olation is needed such that the diffusivity is calculated according to two levels of 
time step. Table 1 indicates that a simple extrapolation along with the Crank- 
Nicolson, in the form of = 2 • g”®’" — for each time step, can boost the 
accuracy. We will refer to some more involved extrapolation procedures, such as 
the Douglas Jones predictor-corrector method proposed in |^, in Section 4. 

4 Accurate Operator Splitting Schemes 

Motivated by ADI jS| which is known as a favorable splitting scheme for the 
linear diffusion equation, we wish to combine the merits of the AOS scheme as 
a symmetric scheme, together with the family of multiplicative operator split- 
tings. Multiplicative operator splittings are known in general to be more accurate 
than the AOS schemes. We therefore propose a symmetric scheme, mentioned 
by Strang in 0, which is both additive and multiplicative operator splitting 



284 Danny Barash, Moshe Israeli, and Ron Kimmel 



Table 1. I 2 Norm Error Estimation. 



r 


Linear CN 


Nonlin CN 


Extrapolation 


0.4 


0.0079 


0.052 


0.0262 


0.2 


0.002 


0.024 


0.0068 


0.1 


4.94 • 10"'‘ 


0.0113 


0.0024 


0.05 


1.22 • 10"'‘ 


0.0052 


5.6- lO""' 


0.025 


2.9 • 10"® 


0.0022 


1.18 • 10"'‘ 


0.0125 


5.81 • 10"® 


7.34 • 10“^ 


2.37 • 10"® 


0.00625 


0 


0 


0 




Fig. 1. Explicit Scheme (Forward-Euler) . Left: Original Noisy Signal. Middle: 
100 Time Steps of r = 0.5. Right: 5 Time Steps of r = 10.0. 





Fig. 2. DuFort-Frankel. Left: 8 Time Steps of r = 0.4. Right: 4 Time Steps of 
T = 0.8. 




Fig. 3. Semi-implicit Scheme (Backward Euler). Left: 5 Time Steps of r = 10.0. 
Right: Same as Left, Except the Diffusivity g{s) = (H-3s®)“^(s > 0) Was Used. 







An Accurate Operator Splitting Scheme for Nonlinear Diffusion Filtering 285 




Fig. 4. Crank-Nicolson. Left: 5 Time Steps of t = 10.0. Right: Same as Left, 
Except the Diffusivity g{s) = (1 + 3s®)“^(s > 0) Was Used. 



(AMOS) 

+ (/ - tA2{u'^))-\I - 

Equation (7) applies the AMOS scheme to the semi-implicit scheme. Such a com- 
bination is known in the literature |2| as the approximate factorization implicit 
(AFI) scheme, which is first order accurate in time. However, even in the case 
where it is built upon the semi-implicit scheme, the AMOS scheme is expected 
to be more accurate than the AOS scheme while preserving symmetry. Further- 
more, it is possible to try to achieve better accuracy by applying the AMOS 
scheme on the Crank-Nicolson scheme. At each time step, two calculations are 
performed 

and 

(/ - u’^* = {I+ iA2(M'=)) 

(/ - = (/ + f Ai(w'=)) (9) 

After the time step is completed, the two results are averaged together which en- 
sures a symmetric splitting. Although the directions are not alternating in each 
of the two calculations, i.e. the forward and backward Euler are performed on the 
same direction, in effect this scheme belongs to the family of alternating direc- 
tion implicit (ADI) type methods. In our experiments, alternating the directions 
as in the classical ADI, produced no better results when applied to nonlinear 
diffusion filtering. Therefore, we refer to (8), (9) as ADI, whereas (7) is AFI. We 
also note that adding the extrapolation suggested in the one-dimensional case, 
as in Table 1, did not increase the order of accuracy to exactly second when per- 
forming quantitative calculations with very small time steps in two dimensions. 
At the expense of more computations, one can try to improve the extrapolation 



= (/+§Ai(m'=)) 

= {I+lA2{u>^))u’^*, ( 8 ) 
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procedure by using the Wynn extrapolation or predictor-corrector methods, such 
as Adams Bashforth |2j or Douglas Jones jO], in which the Crank-Nicolson is the 
corrector. While these more complicated procedures are costly, it is not obvious 
how much accuracy will be gained as a consequence for larger time steps and 
whether this will be justifiable. However, practical use in applications requires 
mostly large time steps to do the filterting, and it turns out the ADI scheme 
in (8), (9) leads to visually better results for such time steps as can be seen in 
Figures 5,6. We take 2000 time steps of 0.1 as a reference, then decrease the 
number of iterations to check the deviation from the reference. As we decrease 
the number of iterations, we observe that the deviation from the converged re- 
sult is smaller with the ADI scheme than with the AOS scheme. Filtering effect 
becomes stronger in the ADI scheme, while preserving fine details, which is an 
indication that the ADI scheme is visually more accurate than the AOS scheme. 
Quantitative examination of the deviations from the reference is calculated as 
follows. We start from the original image in Figure 5 (left), which is a texture 
image taken from a neutron diffraction experiment. Figures 5,6 show the com- 
parison in terms of accuracy between the AOS, AFI and ADI schemes, which 
are discussed next. In terms of speed, the AOS and AFI schemes in actual sim- 
ulations indicate that the AFI scheme takes roughly 1.5 the time it takes the 
AOS scheme to perform the filtering. The ADI scheme is roughly a factor of 2 
to 3 longer in processing some test images relative to the AOS scheme. We note 
that simply decreasing the time step with the AOS scheme by this ratio does 
not produce the fine filtering that is achieved with the ADI scheme. This fact 
can be visually observed in practice and will not be reflected in the results of 
Table 2, as will be explained in the next paragraph. 




Fig. 5. Left: Original Texture Image. Middle and Right: Reference for ADI and 
AOS/ AFI, respectively. In all, nonlinear diffusion filtering was performed with 
2000 time steps of t = 0.1. 



In Table 2, the relative I 2 norm errors are calculated for the example in 
Figures 5,6 as follows. Let v denote the reference solution: AOS, r = 0.1, in 
the case of the AOS and AFI schemes, and ADI, r = 0.1, in the case of the 
ADI scheme. Let u denote the approximate solution in each of the schemes. The 
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Fig. 6. Left:ADI. Middle: AFI. Right: AOS. In all, nonlinear diffusion filtering 
was performed with four time step of r = 50.0. 




Fig. 7. Comparison of error estimation for different time steps based on Table 
2: AOS, AFI, ADI. However, note that for the ADI, a different reference image 
was used. 

Table 2. I 2 Norm Error Estimation. 



T 


AOS 


AFI 




ADI 




0.25 


0.09 


% 


0.06 


% 


0.08 


% 


0.5 


0.13 


% 


0.1 % 


0.11 


% 


1.0 


0.17 


% 


0.14 


% 


0.13 


% 


2.0 


0.22 


% 


0.17 


% 


0.17 


% 


5.0 


0.29 


% 


0.24 


% 


0.19 


% 


10 


0.36 


% 


0.27 


% 


0.21 


% 


20 


0.47 


% 


0.32 


% 


0.23 


% 


50 


0.79 


% 


0.41 


% 


0.47 


% 


100 


1.3 % 


0.54 


% 


1.25 


% 


200 


2.07 


% 


0.81 


% 


3.14 


% 
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relative error percentages are calculated by 




( 10 ) 



Note that the small relative error percentage values do not completely reflect the 
strength of the deviations and accuracies, since large propagation times produce 
smooth images, where the differences between the schemes appear only in small 
regions near prominent features within the original image. Moreover, the com- 
parison with the ADI scheme is done for a separate reference frame, since even 
with a small time step the ADI scheme acts as a better Alter, as can be seen 
in several test images, and hence its reference to measure deviations should be 
different. Therefore, Table 2 and the plot in Figure 7 should be analyzed with 
caution, especially with respect to the comparison between the ADI and the 
AOS/AFI. From Table 2 and Figure 7 it can be observed that up to a time step 
of r = 50.0, the ADI scheme is the most accurate, which is expected because the 
Crank-Nicolson is used as its building block. With very large time steps of more 
than T = 50.0, the AFI scheme is the most balanced scheme in deviations from 
the corresponding references, probably because the higher order error terms af- 
fect the closeness of the ADI scheme to its reference in Figure 5. Among the 
schemes which are based on the semi-implicit scheme as their building block, 
the AFI scheme will produce more accurate results than the AOS scheme since 
the AMOS scheme is a more accurate splitting scheme than the AOS scheme at 
the expense of some increase in computations. Finally, we tried to obtain better 
accuracy out of the results in Figure 6 by using Richardson’s extrapolation |21 
for our case 






( 11 ) 



where i?/(r/2) denotes an improved result, using a time grid with a spacing of 
r/2 or coarser. i?(r/2) and R{t) are the results of applying nonlinear diffusion 
Altering for time steps r/2 and r, respectively. Our trials failed to show an 
improvement of i?/(r/2) relative to R(t/2). An improvement is not guaranteed 
to begin with, since our equation is nonlinear and the solution is non-smooth. 



5 Conclusions 

In this paper, accurate splitting operator schemes were proposed for perform- 
ing nonlinear diffusion Altering. They are gradually constructed by reviewing 
schemes which are relevant and have been suggested in this context to other 
applications. Comparing two splitting schemes, it is found that higher order of 
accuracy can be visually inspected and might become a desirable feature in some 
future applications. 

The two splitting methods which unconditionally satisfy all discrete scale- 
space criteria are Weickert et al ’s HD] AOS scheme and our proposed scheme, 
the AMOS scheme. The AOS scheme is more efficient than the AMOS scheme 
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in its first-order form, the AFI scheme, by approximately a factor of 1.5, and the 
AMOS scheme in its second-order form, the ADI scheme, by a factor of 2 to 3, 
depending on the efficiency of the implementation. Although the AOS remains 
the simplest and most efficient choice for implementation, in the arsenal of nu- 
merical schemes for performing nonlinear diffusion filtering the AMOS scheme 
can be considered as an extension for applications that require high accuracy. 
Multiplicative operator schemes are in general more accurate than their addi- 
tive counterparts, and the combination of the two in the AMOS schemes ensures 
both symmetry and better accuracy at the expense of an increase in execution 
time. 
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Abstract. We develop a novel time-selection strategy for iterative image 
restoration techniques: the stopping time is chosen so that the correlation 
of signal and noise in the filtered image is minimised. The new method is 
applicable to any images where the noise to be removed is uncorrelated 
with the signal; no other knowledge (e.g. the noise variance, training 
data etc.) is needed. We test the performance of our time estimation 
procedure experimentally, and demonstrate that it yields near-optimal 
results for a wide range of noise levels and for various filtering methods. 



1 Introduction 

If we want to restore noisy images using some method which starts from the 
input data and creates a set of possible filtered solutions by gradually removing 
noise and details from the data, the crucial question is when to stop the filtering 
in order to obtain the optimal restoration result. The restoration procedures 
needing such a decision include the linear scale space |5], the nonlinear diffusion 
filtering m, and many others. We employ a modified version of the Weickert’s 
edge-enhancing anisotropic diffusion [0| for most experiments in this paper. 

The stopping time T has a strong effect on the diffusion result. Its choice 
has to balance two contradictory motivations: small T gives more trust to the 
input data (and leaves more details and noise in the data unfiltered), while large 
T means that the result becomes dominated by the (piecewise) constant model 
which is inherent in the diffusion equations. The scale-space people often set T 
to a large value (ideally infinity) and observe how the diffused function evolves 
with time (and converges to a constant value). As we are more concerned with 
image restoration and we want to obtain nontrivial results from the diffusion 
filter, we will have to pick a single (finite) time instant T and stop the diffusion 
evolution there. 

We work with the following model (see Fig. QJ): let f be an ideal, noise- free 
(discrete) image; this image is observed by some imprecise measurement device 

* This research was supported by the Czech Ministry of Education under project 
LN00B096. 
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f 



+n 




Fig. 1. Model of the time-selection problem 
for the diffusion filtering. We want to select 
the filtered image u(T) which is as close as 
possible to the ideal signal f. 



to obtain an image f . We assume that some noise n is added to the signal during 
the observation so that 

f = f -I- n. (1) 

Furthermore, we assume that the noise n is uncorrelated with the signal f, and 
that the noise has zero mean value, E{n) = oQ 

The diffusion filtering starts with the noisy image as its initial condition, 
u(0) = f, and the diffusion evolves along some trajectory u(t). This trajectory 
depends on the diffusion parameters and on the input image; the optimistic 
assumption is that the noise will be removed from the data before any important 
features of the signal commence to deteriorate significantly, so that the diffusion 
leads us somewhere ‘close’ to the ideal data. This should be the case if the signal 
adheres to the piecewise constant model inherent in the diffusion equation. 

The task of the stopping time selection can be formulated as follows: select 
that point u(T) of the diffusion evolution which is nearest to the ideal signal f. 
Obviously, the ideal signal is normally not available; the optimal stopping time 
T can only be estimated by some criteria, and the distanceO between the ideal 
and the filtered data serves only in the experiments to evaluate the performance 
of the estimation procedure. 

^ Let us review the statistical definitions used in the paper (see e.g. Papoulis 0). For 
the statistical computations on images, we treat the pixels of an image as indepen- 
dent observations of a random variable. 

The mean or expectation of a vector a; is a; = E{x) = 

We define the variance of a signal x as var(a:) = E[{x — . 

The covariance of two vectors x, y is given by cov(a;, y) = E [{x — x)-{y — y)] . 

The normalized form of the covariance is called the correlation coefficient, 
coxrix.y) = , ^ 

■sj var(a:)-var(y) 

^ In the experiments below, we measure the distance of two images by the mean 
absolute deviation, MAD(a; — y) = E{\x — y\). 
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In the following paragraphs we first cite the approaches to stopping time 
selection which have appeared in the literature, and comment on them. Then 
we develop a novel and reliable time-selection strategy based on signal-noise 
decorrelation. 



2 Previous Work 



In the diffusion model of Catte et al. P, the image gradient for the diffusivity 

computation is regularized by convolution with a Gaussian smoothing kernel 

Ga- The authors argue that this regularization introduces a sort of time: the 

result of convolution is the same as the solution to the linear heat equation at 
2 

time t = so it is coherent to correlate the stopping time T and the ‘time’ 

2 

of the linear diffusion. However, the equality t ^ is rather a lower estimate 
of the stopping time: because of the diffusion process inhibited near edges, the 
nonlinear diffusion is always slower than the linear one, and needs a longer time 
to reach the desired results. 

Dolcetta and Ferretti recently formulated the time selection problem as a 
minimization of the functional 

E{T)= [ E, + E, (2) 

Jo 

where Ec is the computing cost and Es the stopping cost, the latter encourag- 
ing filtering for small T. The authors provide a basic example Ec = c, Eg = 
— |u(a:,T) — u(a;, 0)pda;) where the constant c balancing the influence of 

the two types of costs has to be computed from a typical image to be filtered. 

Sporring and Weickert in [Z] study the behaviour of generalized entropies, and 
suggest that the intervals of minimal entropy change indicate especially stable 
scales with respect to evolution time. They estimate that such scales could be 
good candidates for stopping times in nonlinear diffusion scale spaces. However, 
as the entropy can be stable on whole intervals, it may be difficult to decide on 
a single stopping instant from that interval; we are unaware of their idea being 
brought to practice in the field of image restoration. 

Weickert mentioned more ideas on the stopping time selection, more closely 
linked to the noise-filtering problem, in m- They are based on the notion of 
relative variance. 

The variance var(u(t)) of an image u(t) is monotonically decreasing with t 
and converges to zero as t — *■ oo. The relative variance 



r{u{t)) 



var(u(t)) 

var(u(0)) 



(3) 



decreases monotonically from 1 to 0 and can be used to measure the distance of 
u(t) from the initial state u(0) and the final state u(oo). Prescribing a certain 
value for r(u(T)) can therefore serve as a criterion for selection of the stopping 
time T. 
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Let again f be the ideal data, the measured noisy image f = f + n, and let the 
noise n be of zero mean and uncorrelated with f . Now assume that we know the 
variance of the noise, or (equivalently, on the condition that the noise and the 
signal are uncorrelated) the signal-to-noise ratio, defined as the ratio between 
the original image variance and the noise variance. 



SNR = 



var ( f ) 
var(n) 



( 4 ) 



As the signal f and the noise n are uncorrelated, we have 



var(f) = var( f ) + var(n). 



( 5 ) 



Substituting from this equality for var(n) into we obtain by simple rear- 
rangement that 



var(f) 1 

var(f) “ 1 + sm ' 



( 6 ) 



We take the noisy image for the initial condition of our diffusion filter, u(0) = 
f. An ideal diffusion filter would first eliminate the noise before significantly 
affecting the signal; if we stop at the right moment, we might substitute the 
filtered data u(T) for the ideal signal f in (jOJ. Relying on this analogy, we can 
choose the stopping time T such that the relative variance satisfies 



r(u(T)) 



var(u(T)) 

var(u(0)) 



1 



1-k ^ 

^ SNR 



( 7 ) 



Weickert remarks that the criterion o tends to underestimate the optimal stop- 
ping time, as even a well-tuned filter cannot avoid influencing the signal before 
eliminating the noise. 

So far the Weickert ’s suggestions from mg: knowing the SNR, we decide to 
filter the image until some distance from the noisy data is reached, and the 
formula 0 tells us when to stop the diffusion. This idea seems natural and 
resembles also that used in the total variation minimizing methods (see overview 
in jOl pp. 50-52]). However, our experiments indicate that this approach does 
not usually yield the optimal stopping time. Let us study in more detail why the 
problems occur. 



3 Decorrelation Criterion 

The equality o and hence the equation (jOJ are valid only if the signal and the 
noise are uncorrelated. This assumption holds for f and n, but not necessarily 
for the filtered signal u(T) and the difference u(0) — u(T); the latter is needed 
for the equation CD to be justified. In other words (if we substitute mentally the 
filtered function u(T) for f, the difference nu = u(0) — u(T) for the noise n, and 
u(0) for f in OSJ and Q), the formula CD is useful only if the random variables 
u(T) and (u(0) — u(T)) are uncorrelated. 
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Fig. 2. Experimental Data. Left: Cymbidium image (courtesy Michal Haindl). 
Right: Noisy input image ([0,127]^ ^ [0,255]) for the ‘Triangle and rectangle’ 
experiment. Noise with uniform distribution in the range [—255, 255] was added 
to two-valued synthetic data. 

Inspired by these observations, we arrive to the following idea: if the unknown 
noise n is uncorrelated with the unknown signal f, wouldn’t it be reasonable to 
minimize the covariance of the ‘noise’ (u(0) — u(t)) with the ‘signal’ u(t), or - 
better - employ its normalized form, the correlation coefficient 



and choose the stopping time T so that the expression (0) is as small as possi- 
ble? This way, instead of determining the stopping time so that (u(0) — u(T)) 
satisfies a quantitative property and its variance is equal to the known variance 
of the noise n, we try to enforce a qualitative feature: if the ideal f and n were 
uncorrelated, we require that their artificial substitutes u(T) and (u(0) — u(T)) 
reveal the same property, to the extent possible, and select 



Let us test and validate this new stopping time criterion experimentally. 

4 Experiments 

We added various levels of Gaussian noise to the cymbidium image shown in 
Fig. 0 left, filtered by nonlinear diffusion (more precisely a modified version of 
the Weickert’s edge-enhancing anisotropic diffusion 0, numerically implemented 



corr(u(0) — u(f),u(f)) 



cov(u(0) — u(t), u(f)) 



( 8 ) 




T = argmincorr(u(0) — u(t),u(t)). 



(9) 
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Fig. 3. Left: The distance MAD(u(t) — f) (solid line) and the correlation coeffi- 
cient corr(u(0) — u(t),u(t)) (dashed line) developing with the diffusion time. 
Right: The stopping time Tsnr determined by the SNR method (dotted with 
crosses), and Tcorr obtained through the covariance minimization (dotted with 
diamonds) compared to the optimal stopping time Topt (solid line). The graphs 
are plotted against the standard deviation of noise in the input image. 



using the AOS scheme 0), and observed how the signal-noise correlation mea- 
sured by equation Q develops with the diffusion time. A typical example is 
drawn in Fig. 0 left: you can observe that the plot of the MAD criterion of the 
filtering quality coincides very well with the graph of the correlation coefficient 
corr(u(0) — u(t), u(f)) . 

A more thorough study of the performance of the stopping time selection 
criteria (measured again on the cymbidium data) is seen in figures fright and ^ 
The former compares three stopping times: the optimal Topt is the time instant 
for which the filtered image u(t) is closest to the noise-free f in the MAD dis- 
tance; obviously, Topt can be found only in the artificial experimental setting, 
the noise-free f is normally not available. The second stopping time Tsnr is 
determined using the criterion ((2J) (which requires the knowledge of the noise 
variance or SNR). The stopping time Tcorr minimizes the correlation coefficient 
of equation ( 0 . All alternative stopping times are computed for a series of in- 
put images with varied amount of noise present. While the SNR method easily 
underestimates or overestimates the optimal stopping time (depending on the 
amount of noise in the input data), the correlation minimization leads to near- 
optimal results for all noise levels. The graph is plotted for iteration time step 
r = 0.5, other choices t G {0.1, 1} gave similar results. 

The actually obtained quality measure MAD(u(T) — f) is shown in Fig. El 
again with r = 0.5. You can see that for all noise levels the correlation-estimated 
time leads to filtering results very close to the optimal values obtainable by the 
nonlinear diffusion. 

Let us return for a moment to Fig. 01eft. At the beginning of the diffusion 
filtering, the correlation coefficient declines fast until it reaches its minimum. 
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t = 0.5 




» 



* 






Fig. 4. Left: The MAD distance of the filtered data from the ideal noise- free 
image, MAD(u(T) — f), using the SNR and the correlation-minimization time 
selection strategies. Right: the difference between the estimated result and the 
optimal one, MAD(u(T) — u(Topt)). 



If for some data the graph behaves differently, it may serve as a hint on some 
problems. As an example, we observed that if there is only a small amount of 
noise in the input image, the correlation corr(u(0) — u(t), u(t)) might grow from 
the first iterations. In such a case, the iteration time step t has to be decreased 
adaptively and the diffusion restarted from time t = 0 until the correlation plot 
exhibits a clear minimum. 

Another experiment compares the results of different diffusion algorithms 
filtering an originally black and white image with non-Gaussian additive noise. 
The input data are shown in Fig.|2|right: the noisy image was obtained by adding 
noise of uniform distribution in the range [—255, 255] to the ideal input, and by 
restricting the noisy values into the interval [0, 255]. 

In Fig. El the noise is smoothed by linear diffusion, isotropic nonlinear dif- 
fusion, and two anisotropic diffusion filters; the grey-values are stretched to the 
whole interval [0,255] so that a higher contrast between the dark and bright 
regions corresponds to a better noise-filtering performance. In all cases, the 
stopping time was determined autonomously by the signal-noise decorrelation 
criterion Q. You can see that in all cases, although quite different filtering algo- 
rithms were employed, the stopping criterion leads to results where most of the 
noise is removed and the ideal signal becomes apparent or suitable for further 
processing; we support this statement by showing the thresholded content of the 
filtered images in Fig. 0 

The stopping criterion was designed to minimize the MAD distance from the 
ideal function. If visual quality was the goal to be achieved, we would probably 
stop the diffusion later, especially as linear diffusion (Fig. and the Weickert’s 
edge-enhancing anisotropic diffusion 0 with maximum amount of diffusion in 
the coherence direction {(p 2 = 1, Fig.Et) are concerned. We find however that the 
MAD distance and the visual quality are in a good agreement in Fig. EJi which 
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Fig. 5. Comparing the different diffusion algorithms on the noisy data of Fig. 0 
right, all with the stopping time selected autonomously by minimizing the crite- 
rion 0: (a) linear diffusion, T = 3.8; (b) isotropic nonlinear diffusion, T — 125; 
(c) anisotropic NL diffusion, <^2 = 1, F = 15; (d) anisotropic NL diffusion, 
ifi2 = 0.2, r = 32. 

In (b)-(d), the parameters cr = 1, r = 1 were employed, and the parameter A 
was estimated using the Perona-Malik procedure from percentile p = 0.9 in each 
step. 




a b e d 



Fig. 6. Thresholded versions of the images in Fig. 0 
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represents the result of the edge-enhancing diffusion with a smaller amount of 
diffusion in the coherence direction, (p 2 = 0.2. Because of limited space, we have 
to refer the reader to Pavel Mrazek’s thesis 0 for details on the filtering proce- 
dures and for more experimental results verifying the decorrelation criterion. 

5 Conclusion 

We have developed a novel method to estimate the optimal stopping time for 
iterative image restoration techniques such as nonlinear diffusion. The stopping 
time is chosen so that the correlation of signal u(T) and ‘noise’ (u(0) — u(T)) 
is minimised. The new criterion outperforms other time selection strategies and 
yields near-optimal results for a wide range of noise levels and filtering param- 
eters. The decorrelation criterion is also more general, being based only on the 
assumption that the noise and the signal in the input image are uncorrelated; 
no knowledge on the variance of the noise, and no training images are needed to 
tune any parameters of the method. 
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Abstract. A framework that naturally unifies smoothing and enhance- 
ment processes is presented. We generalize the linear and nonlinear scale 
spaces in the complex domain, by combining the diffusion equation with 
the simplified Schrodinger equation. A fundamental solution for the lin- 
ear case is developed. Preliminary analysis of the complex diffusion shows 
that the generalized diffusion has properties of both forward and inverse 
diffusion. An important observation, supported theoretically and numer- 
ically, is that the imaginary part can be regarded as an edge detector 
(smoothed second derivative) , after rescaling by time, when the complex 
diffusion coefficient approaches the real axis. Based on this observation, 
a nonlinear complex process for ramp preserving denoising is developed. 
Keywords: Scale-space, image filtering, image denoising, image enhance- 
ment, nonlinear diffusion, complex diffusion. 



1 Introduction 

The scale-space approach is by now a well established multi-resolution technique 
for image structure analysis (see [B|,p),|Sl). Originally, the Gaussian represen- 
tation introduced a scale dimension by convolving the original image with a 
Gaussian of a standard deviation a = This is analogous to solving the 

linear diffusion equation 

It = c\7^I, I\t=o = lo, 0 < c € R, (1) 

with a constant diffusion coefficient c = 1. 

Perona and Malik (P-M) Q proposed a nonlinear adaptive diffusion process, 
where diffusion takes place with a variable diffusion coefficient in order to reduce 
the smoothing effect near edges. The P-M nonlinear diffusion equation is of the 
form: A = V • (c(|V/|)V/), c(-) > 0, where c is a decreasing function of the 

gradient. Our aim is to see if the linear and nonlinear scale-spaces can be viewed 
as special cases of a more general theory of complex diffusion-type processes. 

Gomplex diffusion- type processes are encountered i.e. in quantum physics and 
in electro-optics. The time dependent Schrodinger equation is the fundamental 
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equation of quantum mechanics. In the simplest case for a particle without spin 
in an external field it has the form 

= + , ( 2 ) 

where if) = ipit^x) is the wave function of a quantum particle, m is the mass 
of the particle, h is Planck’s constant, V{x) is the external field potential, A is 
the Laplacian and i = With an initial condition 'tp\t=o = '4’o{x), requiring 

that •) G L 2 for each fixed t, the solution is •) = where the 

exponent is a shorthand for the corresponding power series, and the higher order 
terms are defined recursively by The operator 

H = -^A + V{x), (3) 

called the Schrodinger operator, is interpreted as the energy operator of the par- 
ticle under consideration. The first term is the kinetic energy and the second is 
the potential energy. The duality relations that exist between the Schrodinger 
equation and diffusion theory have been studied in P| . Another important com- 
plex PDE in the field of phase transitions of traveling wave systems is the complex 
Ginzburg- Landau equation (CGL): = (1 -|- iy)uxx + Ru ~ (1 + ip)\u\^u- Note 

that although these flows have a diffusion structure, because of the complex 
coefficient, they retain wave propagation properties. 

In both cases a non-linearity is introduced by adding a potential term while 
the kinetic energy stays linear. In this study we employ the equation with zero 
potential (no external field) but with non-linear “kinetic energy” . To better un- 
derstand the complex flow, we study in Section 2 the linear case and derive the 
fundamental solution. We show that for small imaginary part the flow is approx- 
imately a linear real diffusion for the real part while the imaginary part behaves 
like a second derivative of the real part. Indeed as expected, the imaginary part 
is directly related to the localized phase and zero crossings of the image, and 
this is one of the important properties obtained by generalizing the diffusion 
approach to the complex case. The non-linear case is studied in Section 3 and 
the intuition gained from the linear case is used in order to construct a special 
non-linear complex diffusion scheme which preserves ramps. The advantage over 
higher order PDE’s and over the P-M algorithm is demonstrated in one- and 
two-dimensional examples. 

2 Linear Complex Diffusion 

2.1 Problem Definition 

We consider the following initial value problem: 

— cCa, , t 0, 

I{x] 0) = /o G K, c, I gC. 



a; G K 



(4) 
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This equation is a generalization of two equations: the linear diffusion equation 
du for c G R and the simplified Schrodinger equation, i.e. c G I and V{x) = 0. 
When c G K there are two cases: for c > 0 the process is a well posed forward 
diffusion, whereas for c < 0 an ill posed inverse diffusion process is obtained. 



2.2 Fundamental Solution 

We seek the complex fundamental solution h{x] t) that satisfies the relation: 

I{x;t) = Iq * h{x]t) (5) 

where * denotes convolution. We rewrite the complex diffusion coefficient as 
c = re*®, and, since there does not exist a stable fundamental solution of the 
inverse diffusion process, restrict ourselves to a positive real value of c, that is 
0 G (— f , f ). Replacing the real time variable t by a complex time r = ct, we get 
It = Ixx, I{x',Q) = Iq. This is the linear diffusion equation with the Gaussian 
function being its fundamental solution. Reverting back to t, we get: 



h{x; t) = 



K 



2\/Trtc 



o-x^l(4tc) 



( 6 ) 



where iC G C is a constant calculated according to the initial conditions. For 
c G M we have K = 1. Separating the real and imaginary exponents we get: 

h f T' — cos S j {4tr) sin 9 j (4tr) 

where A = , ga{x; t) = 



/cos 9 ’ 



and 



a{x; t) = 



x"^ sin 0 



a{t) = 



y/27Z(T(t) 



2tr 






Atr ’ ' * V COS0 

Satisfying the initial condition /(a:;0) = Iq requires h{x]t 



( 7 ) 

0) = (5(a;). Since 

limj^o 9 a{x', = 6{x), we should require K = 1/ A (indeed RT = 1 for the 

case of positive real c (6* = 0) ). The fundamental solution is therefore: 

h{x]t) = ga{,xff)e^°‘^^''*-\ (8) 

with the Gaussian’s standard deviation a and exponent function a as in O- 



2.3 Approximate Solution for Small Theta 

We will now show that as 0 —!■ 0 the imaginary part can be regarded as a 
smoothed second derivative of the initial signal, factored by 9 and the time 
t. Generalizing the solution to any dimension with Gartesian coordinates x = 
{xi,X2, --Xn) G /(x;t) G and denoting that in this coordinate system 

gcr(x;t) = nf we show that: 



Imil) . ^ 

hm — - — = tZ\gs * /o, 
9^0 9 



( 9 ) 
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where Im{-) denotes the imaginary value and a = limg—,oa = a/^- For con- 
venience we use here a unit complex diffusion coefficient c = e*®. We use the 
following approximations for small 9: cos9 = 1-1- 0(0^) and sinO — 9 + 0{9^). 
Introducing an operator H, which is similar to the Schrodinger operator, we can 
write equation (in any dimension) as: It = HI; I\t=o = Iq, where H = cA. 
The solution is 7 = Iq, and is the equivalent of ( 0 , ( 0 - Using the above 
approximations we get: 

7(x,t) = 

« -I- i9tA)Io = (1 -I- i9tA)ga- * Iq. 

A thorough analysis of the approximation error with respect to time and 9 
will be presented elsewhere. We should comment that part of the error depends 
on the higher order derivatives (4th and higher) of the signal, but, as these 
derivatives are decaying exponentially by the Gaussian convolution, this error 
diminishes quickly with time. Numerical experiments show that for 9 — tt/ 30 
the peak error is ~ 0.1% for the real part and 3 — 5% for the imaginary part 
(depending on the signal). Though the peak value error of the imaginary part 
seems large, the zero crossing location remains essentially accurate. 

Some further insight into the behavior of the small theta approximation can 
be gained by separating real and imaginary parts of the signal and diffusion 
coefficient in to a set of two equations. Assigning I = Iji + ilj, c = cn + ici, we 
get 

f ^Rt CR^Rxx ^I^Ixx ; ^R\t—0 7q (10) 

( ^It ^I^Rxx T ^R^Ixx ; 0; 

where cr = cos 9 , c/ = sin9. The relation Ir^x ^ (^Iixx holds for small enough 

9, which allows us to omit the right term of the first equation to get the small 

theta approximation: 

^Rt ~ ^Rxx ! ^It ~ ^Ixx T ^^Rxx- 

In (j1 II) Ir is controlled by a linear forward diffusion equation, whereas 7/ is 
affected by both the real and imaginary equations. We can regard the imaginary 
part as lit ~ (^^Rxx + (”^ smoothing process”). 

2.4 Examples 

We present examples of ID and 2D signal processing with complex diffusion 
processes characterized by small and large values of 9. In Fig. dn> a unit step is 
processed with small and large 9 {-^, respectively). In Figs. O and 0 the 
cameraman image is processed with same 9 values. The edge detection (smoothed 
second derivative) qualitative properties are clearly apparent in the imaginary 
part for the small 9 value, whereas the real value depicts the properties of 
ordinary Gaussian scale-space. For large 9 however, the imaginary part feeds 
back into the real part significantly, creating wave-like structures. In addition, 
the signal exceeds the original maximum and minimum values, violating the 
’’Maximum-minimum” principle - a property suitable for sharpening purposes. 
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Fig. 1. Complex diffusion applied to a step signal. From left to right: small 0 
{9 = 7t/ 30) real and imaginary values, large 6 (0 = 147t/ 30) real and imaginary 
values. Each frame depicts from top to bottom: original step, diffused signal after 
times: 0.025, 0.25, 2.5, 25. 



3 Nonlinear Complex Diffnsion 



Nonlinear complex processes can be derived from the above mentioned proper- 
ties of the linear complex diffusion for purposes of signal and image denoising or 
enhancement. We suggest an example of a nonlinear process for ramp edges de- 
noising purposes (different from the widely used step edges denoising methods) . 

We are looking for a general nonlinear diffusion equation 

/* = ^(c(-)4) (12) 

that preserves smoothed ramps. Following the same logic that utilized a gra- 
dient measure in order to slow the diffusion near step edges, we search for a 
suitable differential operator V for ramp edges. Eq. ra with a diffusion coeffi- 
cient c(|I?/|) which is a decreasing function of \T>I\ can be regarded as a ramp 
preserving process. We begin by examining the gradient, as a possible candi- 
date, concluding that it is not a suitable measure for two reasons: The gradient 
does not detect the ramp main features - namely its endpoints; Moreover, it 
has a nearly uniform value across the whole smoothed ramp, causing a nonlinear 
gradient-dependent diffusion to slow the diffusion process in that region, thus not 
being able to properly reduce noise within a ramp (creating staircasing effects). 
The second derivative (Laplacian in multiple dimensions) is a suitable choice: It 
has a high magnitude near the endpoints and low magnitude everywhere else - 
and thus enables the nonlinear diffusion process to reduce noise within a ramp. 

We formulate c(s) as a decreasing function of s: 



c{s) = . ^ where c(s) = c{\Ixx\)- 
1 + 



Using the c of II 1 311 in ca) we get: 
. d f Ix 



9a; V 1 + Ixx 



1 -I- —“ITT 

^ I ^xx ^^X^XXX 



Ixx ■ 



(13) 



(1 + ^xxY 



(14) 
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Fig. 2. Complex diffusion of the cameraman image for small theta {9 = tt/ 30). 
Top - real values, bottom - imaginary values (factored by 20). Each frame (from 
left to right): original, image after times: 0.25, 2.5, 25. 




Fig. 3. Complex diffusion of the cameraman image for large theta {9 = 147t/ 30). 
Top - real values, bottom - imaginary values (factored by 20). Each frame (from 
left to right): original, image after times: 0.25, 2.5, 25. 
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There are two main problems in this scheme. The first and more important 
one is the fact that noise has very large (theoretically unbounded) second deriva- 
tives. Secondly, a numerical problem arises as third derivatives should be com- 
puted, with large numerical support and noisier derivative estimations. These 
two problems are solved by using the nonlinear complex diffusion. 

Following the results of the linear complex diffusion (Eq. 0 we estimate by 
the imaginary value of the signal divided by 9, the smoothed second derivative 
multiplied by the time t. 

Whereas for small t this terms vanish, allowing stronger diffusion to reduce 
the noise, with time its influence increases preserving the ramp features of the 
signal. We should comment that these second derivative estimations are more 
biased than in the linear case, as we have a nonlinear process. 

The equation for the multidimensional process is 

/* = V • (c(/m(/))V/), 

„iS 

c{Im{I)) = (15) 

i + (^) 

where A: is a threshold parameter. The phase angle 9 should be small {9 « 1). 
Since the imaginary part is normalized by 9, the process is not affected much by 
changing the value of 9 as long as it stays small. 

We implement this flow with forward Euler scheme with central difference 
approximation for the spatial derivatives and backward time derivative. Care 
should be exercised when choosing the time step. The fundamental solution in- 
cludes a Gaussian with variance Implementing Gaussian convolution 

of time T by incremental time steps where cr^ = 2r requires the time step bound 
to be: At < 0.25/i^ (in 2 dimensions, where h is the spatial step). Here we have 
T = and hence in the general case we require: At < 0.25h^^^-^, and for our 
case where r = 1, h = 1: Z\t < 0.25 cos 0. 

This means that when 9 approaches tt/ 2 it is very inefficient to implement 
complex diffusion with incremental time-steps. For small 9 there is essentially 
no difference than real diffusion (works also in the nonlinear case). 

In Figs. 01 and 0| we show an example of a noisy ramp denoised by a P-M 
process in comparison to the above process (with 9 = ^). One can notice that 
the known P-M’s staircasing effect does not happen in our nonlinear complex 
scheme. In Fig. 0 the process is applied to an apple image that contains both 
sharp (step) and gradual (ramp) edges. Note that using the regularized P-M 
version of Gatte et al. 0 produces staircasing results similar to the original P-M 
process. 

4 Conclusion 

The fundamental solution for the linear complex diffusion indicates that there 
exists a stable process for 0 S In the case of small 9 two observations are 

relevant to the application of the complex diffusion process in image processing: 
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Fig. 4. Perona-Malik nonlinear diffusion of a ramp edge (fc = 0.1). Left - original 
(top) and noisy ramp signal (white Gaussian, SNR=15dB) . Middle - denoised 
signal at times 0.25, 1,2.5, from top to bottom, respectively. Right - respective 
values of c coefficient. 




Fig. 5. Nonlinear complex diffusion of a ramp edge (0 = tt/30, k = 0.07). 
Left - real values of denoised signal at times 0.25,1,2.5, from top to bottom, 
respectively. Middle - respective imaginary values, right - respective real values 
of c. 



The real function equation is effectively decoupled from the imaginary one, and 
behaves like a real linear diffusion process; The imaginary part is approximately 
a smoothed second derivative of the real part. Therefore, we can regard the 
Gaussian and Laplacian ’’pyramids” (scale-spaces) as results of a single complex 
diffusion equation. 

Although the nonlinear scheme remains to be better analyzed and under- 
stood, a ramp preserving denoising process was demonstrated as an example of 
possible applications of complex diffusion schemes. 
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Abstract. This paper addresses the problem of feature enhancement in 
noisy images when the feature is known to be constrained to a mani- 
fold. As an example, we study the problem of direction denoising. This 
problem was treated recently and several solutions were proposed. The 
various solutions share the same structure. They are composed of two 
terms: A diffusion term and a projection term. Analytically, the solu- 
tions differ in the diffusion part. The projection part is equivalent in all 
works. Yet, as it is often the case, the analytically equivalent projection 
terms differ from a numerical viewpoint. We present in this work a new 
parameterization of the problem that enables us to work always in a 
numerically stable way. 



1 Introduction 

Many objects of low-level vision are scalar and vector fields of various types. This 
is the case for gray- value images, color images, movies, 3D volumetric images and 
disparity in stereo vision to name just a few examples. These vector fields are con- 
sidered traditionally as taking value in R". Several types of vector fields, though, 
are constrained in a non-trivial way. When the constraint can be expressed via 
the vanishing of a smooth function, i.e. a polynomial, the vector fields take their 
values in a non-Euclidean manifold. One notable example is the direction vector 
field that assigns to each pixel in the image a local direction. These directions are 
unit length vectors that span the unit n-dimensional sphere S". Other classes 
of non-Euclidean vector fields are perceptually treated color images H3], and 
gradients |H|. The relation to non-linear filters, and short-time kernels is studied 
in pr^ . Here we study the n-dimensional direction vector fields and spherically 
constrained color models via the Beltrami framework HUi. 

Almost all works that try to minimize a functional with respect to a con- 
strained quantity embed the constrained feature in a higher dimensional Eu- 
clidean space and perform the minimization for the coordinates of this uncon- 
strained space. The common approach is to alternate minimization of an uncon- 
strained function and a projection on the constraint manifold. The treatment 
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of direction diffusion was recently addressed, along these lines, in the low-level 
vision community j7l1l14l1^ . The projection in these works was incorporated in 
the PDE directly but the dynamical coordinates are the unconstrained ones and 
we are still facing the problem of projection due to numerical errors. 

We proposed recently 0 to work directly on the constraint’s manifold. Once 
a local coordinate system is chosen for the embedding space and the optimiza- 
tion is done directly in these coordinates we can never leave the feature manifold 
and avoid the problem of projection. The difficulty represented in the problem of 
projection is transformed to the problem of the choice of local coordinate system, 
and the appearance of parametric singularities. Since a compact manifold de- 
mands at least two charts to cover it, we face the problems of choosing the charts 
and deciding locally on which of the charts to work. In our previous publication 
we treated the problem. We generalize, in this paper, the analysis to §" and 
describe two different possible choices for the charts. We describe the hemispheric 
and the stereographic coordinate systems and show the numerical advantage of 
the later. Our solution produces an adaptive smoothing process, which preserves 
orientation discontinuities by the projection of the mean curvature flow in the 
non-Euclidean space to the feature coordinates. The proposed solution works 
for all dimensions and codimensions, and overcomes possible parameterization 
singularities by introducing several internal coordinates on different charts. 



2 The Beltrami Framework 

Let us briefly review the Beltrami geometric framework for non-linear diffusion 
in image processing and analysis El- 

Representation and Riemannian Structure: We represent an image and 
other local features as an embedding map of a Riemannian manifold in a higher 
dimensional space (see P| for the earliest time this idea was put forward for 
color images). In general a n-dimensional (Riemannian) manifold is defined by 
a collection of maps from charts of the manifold to R". Each chart covers part 
of the manifold. Their union covers the whole manifold and the transformation 
of the coordinates on the intersection between any two charts is smooth. The 
Riemannian structure transforms in a proper way under any change of the co- 
ordinate system. We denote the coordinates on the two-dimensional surface by 
(a;^,a:^), the coordinates on a chart of the embedding space by (E^,...,E”). 
The embedding space is a hybrid spatial-feature space. The first two coordi- 
nates ,Y^) are the spatial coordinates and the rest (E^,...,F”) are the 
feature coordinates. The simplest example is the image itself which is repre- 
sented as a 2D surface embedded in We denote the map by E : 17 — > R^. 
Where T7 is a two-dimensional surface. The map Y is given in this example by 
(yi = = x^,Y^ = I{x^,x^)). We choose on this surface a Riemannian 

structure, namely, a metric. The metric is a positive definite and a symmetric 
2-tensor that may be defined through the local distance measurements 



ds^ = gii{dx^Y + 2gi2dx^dx^ + g22{dx^)^. 



( 1 ) 
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We use below the Einstein summation convention in which the above equation 
reads c?s^ = g^^dx^dx’' where repeated indices are summed over. We denote the 
inverse of the metric by . 

Selecting the Induced Metric as the Image Metric: A reasonable assump- 
tion is that distances we measure in the embedding spatial- feature space, such as 
distance between pixels and difference between gray-levels, correspond directly 
to the distance measured on the image manifold. This is the assumption of iso- 
metric embedding under which we can calculate the image metric in terms of the 
embedding maps E* and the embedding space metric hij. This follows directly 
from the fact that the length of infinitesimal distances on the manifold can be 
calculated in the manifold and in the embedding space with the same result. 
Formally, ds^ = g^^^dx^dx’' = hijdY^dYK By the chain rule, dY'‘ = d^Y^dx^, 
we get ds^ = g^^dx^dx'^ = hijdfjY’^d^Y'^dx^dx'^ . From which we have 

5 ^. = h^jd^Y^d^YY ( 2 ) 

Polyakov Action, a Measure on the Space of Embedding Maps: Denote 
by {S, g) the image manifold and its metric, and by (M, h) the space-feature 
manifold and its metric. Then the functional 5'[-, •, •] attaches a real number to 
a map Y : E ^ M 



S[Y\g^,,h,,] = J dV{WY\WY^)gh,j, (3) 

where dV = dx^dx^ ■ ■ ■ dx"^y/g is a volume element, and the scalar product (, )g 
is defined with respect to the image metric, i.e. (VF^, = g^'^d^Y^d^Y'^. 

This functional, for m = 2 (two-dimensional image manifold) and hij = 5ij, 
was first proposed by Polyakov ^ in the context of string theory in high energy 
physics. Note that the image metric and the feature coordinates, i.e., inten- 
sity, color, orientation etc. are independent variables. The minimization of the 
functional with respect to the image metric can be solved analytically in the two- 
dimensional case (see for example cm). The minimizer is the induced metric. If 
we choose, a-priory, the image metric induced from the metric of the embedding 
spatial-feature space M, then the Polyakov action is reduced to an area (volume) 
of the image manifold. 

Using standard methods in the calculus of variations (see cm), the Euler- 
Lagrange equations with respect to the embedding are 

■ ^g^'w = + r},{VY\VY^)g. (4) 

Since (gg,u) is positive definite, g = det(g^,y) > 0 for all x^. The l/^/g factor 
is the simplest one that does not change the minimization solution while giving 
a reparameterization invariant expression. The operator that is acting on F® in 
the first term is the natural generalization of the Laplacian from flat spaces to 
manifolds and is called the Laplace- Beltrami operator, and is denoted by Ag. The 
second term involves the Levi-Civita connection whose coefficients are given in 
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terms of the metric of the embedding space 

Hk = (djhik + dkhji - dih,k) ■ (5) 

This is the term that takes into account the fact that the image surface flows in 
a non-Euclidean manifold and not in K". 

A map that satisfies the Euler-Lagrange equations — = 0 is a 

harmonic map. The one- and two-dimensional examples are a geodesic curve 
on a manifold and a minimal surface. 

The non-linear diffusion or “scale-space” equation emerges as the gradient 
descent minimization flow 



-y* = /i*'- 



= AgY^ + rUVY^ 



This flow evolves a given surface towards a minimal surface, and in general, it 
changes continuously a map towards a harmonic map. 



3 Hemispheric Parameterization 

Fiber Geometry: We are interested in the case where the fiber feature space 
is the hypersurface S". We choose to represent the hyper-sphere S" as a n- 
dimensional manifold embedded in with Cartesian coordinate system 

> as the constrained hyper-surface 

n+3 

= 1. (7) 

i=3 

We work in n -I- 1 charts (with 2(n -|- 1) arcwise connected parts), where 
are local coordinates. On this chart W = W, t = 3, . . . , n -I- 3, i ^ j 

and = ± jl — The points = 0 do not belong to this chart 

V 

and it has two unconnected components, the positive and negative values of . 
We compute below the flow for the chart j = n -I- 3. Other charts are computed 
similarly. 

Denote the metric elements, for the feature space only, by hij. The metric 
elements, and the inverse metric elements, are given by 

^ QjjkQjjk Y^Yi 

b 2-^ dY^dY^ l — 

h-^ = 6^^-Y^YY (8) 

The Induced Metric: Now we are in a position to compute the induced metric 
on the image surface. The embedding map is {Y^ = ,Y“^ = a;^, Y^{x ^ . . . , 
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,x^)). The induced metric is given by 

n+2 

(9) 

i,t=3 

where = dY'- jdx^ . 

The Flow Equations: The non-zero Levi-Civita coefficients for this parame- 
terization derived directly from substitution of Eq. (jS|) in Eq. (0. The simple 
result is 

r]fc = y*h,fc(y3,...,y”+2), z,j,fc = 3,...,n + 2. (lo) 

The minimization of the Polyakov action leads to the following evolution 
equations 

Yi = AgY^ + 2Y^ - y*Tr(g'"^) (11) 

where TrX is the trace of the matrix X. 

One- and Two-Dimensional Directions: Two charts are needed to cover 
The charts are y4)|(y3)2 + ^ 0} and {{Y^,Y^)\{Y^f + 

(y4)2 _ ^ Q|^ each chart we have one coordinate (Y^ and Y^ respec- 

tively) that parameterize that chart. The flows in the charts are 

and 

Yt'^ = AgY'^ + Y‘^^-^. (13) 

The metric in the two charts is identical in its form and depends on the cor- 
responding coordinate that parameterize each chart. In the implementation we 
compute the diffusion for Y^ and simultaneously and determine the up- 
dated values as follows: We take the values (y^,sign(y^)-\/l — (T^)^) for the 
range (Y^)^ < (W^)^, and the values (sign(y^)Y^l — (Y'^y ,Y‘^) for the range 
(y4)2 ^ (y3^2^ Switching point is T* = 1/V^. 

Similarly for the two-dimensional sphere we have three charts. The charts 
are North-South, East-West, and front-rear hemispheres. The corresponding 
flows for the three charts are 

{Y^)t = AgY^ + 2Y^ -Y^g^^ + g^"^) f = l,2,3. (14) 

At each point we find the largest component and compute the update of the 
value in the patch that is locally parameterized by the other two components. 
This way we always perform the computation as far as possible from the sin- 
gularities. The largest component serves as its own sign holder. The switching 
point is y* = l/-\/3. 

In general we need more and more charts as n increases. Note that the sin- 
gular point in the chart j is y-^ =0. The switching point is therefor \/n and 
we are forced to work closer and closer to the singularity as n increases. In the 
following section we present different parameterization in which we work always 
in the furthest point from the singularity. 
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4 Stereographic Parameterization 

Fiber Geometry: The hypersphere §" is realized as the place of all the points 
in that satisfy the constraint = 1. We denote by F* for i = 

3, ...,n + 2 the Cartesian coordinate system on the unit disk Z?" = {[/ G 

= 0}. The intersection of the line between the 
north pole and a point on the south hemisphere, and the the ZZ" disk serves as 
a parameterization of the south hemisphere. Similarly, for the north hemisphere 
and the south pole (see Fig.^Ql for the one- and two-dimensional coordinate 
systems) . 





Fig. 1. The one-dimensional (left) and two-dimensional (right) stereographic 
coordinate systems. 



Explicitly, these transformations are given by 



F* = 



W 

1 ip JJn+3 



i = 3, . . . , n -I- 2. 



Inverting these relations we find 



2F® A — 1 

W = - i = 3,...,n + 2 and = ±- (15) 

A+1 A+1 

where A = ^ upper (lower) sign is for the south (north) 

hemispheres. 

The Induced Metric and Flow Equations: Now we are ready to compute 
the induced metric of our feature space. 



^ QJjkQJjk 

hij = 2^ 



fe=3 



dY^dYi (l-b4)2 



i,j = 3,...,n + 2. 



(16) 



The image surface metric is the induced one 

n+2 ^\/-i A n+2 






= Y,h.. 



dY^dY^ ^ 
dxf^dx- ~ 



dY^dY^ 



(1 -I- 4)2 dxk-dx’^ 
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The Levi-Civita connection is obtained by Eq. da and Eq. © to be 

- Y'Sk^) j, fc = 3, . . . , n + 2. 

The resulting diffusion equations are 

n+3 . 

r/ = AgY^ + ^ {Y^Sjk - yH,, - Y^5u) d^Y^d^Y'^g^^’' (17) 

j.fe=3 ^ 

(z = 3, . . . , n + 2). This can be rearranged to 

Y; = AgY^ - Ag>^''{dglog{l + A))(5,y*) + (1 + A)(2 - 5“ - g^^)Y^ (18) 

(z = 3, . . . , rz + 2). One- and Two-Dimensional Directions: We denote our 
coordinate system by the subscripts s (for south) and n (for north). The equa- 
tions are, therefor, for the one-dimensional case 

(n)t = Zigi; - 4g>^’'{d^ log(l + A)){d^Ys) + (1 + A){2 - g^^ - g^^)Y,, (19) 

where A = Y^ and the induced metric is a function of Yg . Identical equation is 
written for Yn. We solve the north and south equations simultaneously for values 
smaller than 1. Each iteration we update the values which are greater than 1 by 
the simple relation Yg = 1/Yn. Note that the problematic zone(s), i.e. ±1, are 
as far as possible from the singularities, i.e. the poles. 

The two-dimensional case is managed similarly via 

{Y})t = AgYg^ - Ag'^’^idMl + Ag)){d^Yg^) + (1 + A,)(2 - 5“ - 5^^)E/ 
{Y^% = AgY,^ - 4g'^^(d^log(l + Ag)){d.Y,^) + (1 + A,)(2 - 5“ - g^^)Y,m 

where Ag = (E),^)^ -I- (Yg)'^ and the induced metric is a function of Y^ As in the 
one-dimensional case we solve simultaneously for the south and north patches 
and work with E*s which are smaller than 1. The update for values who are 
greater than 1 after the diffusion (in each iteration) is done by Yg = AgY^. The 
decision zone along the equator, is the most numerically stable region since it is 
the furthest from the poles where singularities may appear. 

5 Color Diffusion and Experimental Results 

First, we use machine color space as our spectral model, where we first restrict 
the colors to a unit sphere in the RGB space. The sphere is centered at the RBG 
box as shown in Figure 0 The filter in this case is based on a flow on the 
sphere, while the magnitude is kept fix. A coupled magnitude-chroma process as 
discussed in [3] may be envisaged as well. 

In all these examples the noise was introduced in the chromatic channels 
only. Note that although the chromatic channels carry a small fraction of the 
energy of the signal it has very pronounced perceptual effect. Popular denoising 
techniques which are luminosity based are doomed to fail in this situation. 
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Fig. 2. Colors are restricted to a unit sphere, in the RGB unit box, while 
the magnitude is treated separately. The original image (left), the noisy image 
(middle), and the filtered image (right). The vector fields of part of the images, 
before (left) and after (right) the flow. 



6 Concluding Remarks 

There are three important issues in the process of denoising a constrained feature 
field. The first is to make the process compatible with the constraint in such a 
way that the latter is never violated along the flow. The second is the type of 
regularization which is applied in order to preserve significant discontinuities 
of the feature field while removing noise. The third is numerical accuracy and 
stability. 

We studied, in this paper, the direction diffusion flow where the feature man- 
ifold is the hypersphere S". We proposed to use intrinsic coordinates of the 
constraint’s manifold in order to be compatible with the constraint along the 
flow. We used the Beltrami flow that projects the mean curvature on the fea- 
ture space coordinates. This operation preserves edges while removing the noise. 
Finally we analyzed the hemispheric and stereographic parameterizations for 
the hypersphere geometry. We showed that the stereographic parameterization 
is numerically superior since the decision between the two charts (North and 
South hemispheres) is done along the equator, far away from the singularities 
which are located in the North and South poles. 

The result of this algorithm is an adaptive smoothing process for a con- 
strained feature space in every dimension and codimension. Application to a 
model of color space was used to demonstrate the flows and their edge preserv- 
ing properties. 
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Abstract. In this paper we derive scale space methods for inverse prob- 
lems which satisfy the fundamental axioms of fidelity and causality and 
we provide numerical illustrations of the use of such methods in de- 
blurring. These scale space methods are asymptotic formulations of the 
Tikhonov- Morozov regularization method. The analysis and illustrations 
relate diffusion filtering methods in image processing to Tikhonov regu- 
larization methods in inverse theory. 



1 Introduction 

scale space methods in signal and image proeessing and regularization methods 
for the solution of inverse problems have developed rather independently. In 
fact, there are major philosophical differences between the image processing and 
regularization communities as to what constitutes adequate numerical methods 
for solving problems in the respective fields. This is quite surprising since there 
are many problems which are relevant both in image processing and inverse 
problems. 

In order to discuss different approaches to numerical methods in the two 
areas we relate two paradigms: 

— diffusion filtering methods in image processing and 

— Tikhonov-type regularization methods in inverse problems. 

Moreover, we show that there are adequate modifications of both methods which 
allow their application in the other area. In particular, we derive scale space 
methods for inverse problems. 

One of the most important scale space methods in image processing and 
eomputer vision is diffusion filtering (see e.g. Ill 61 1. Let 17 = [0,1]^. To analyze 
an image defined on 17 the diffusion process 

c)n c^n 

on 17, — = 0 on 9l7, n(0) = on 17 (1) 

is solved. The sequence of filtered or diffused images T> := {u(-, t) : t > 0} is used 
to analyze u^. When it is helpful to specify the dependence of u on the initial 
data u^, we write 
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The sequence of images V defined by the diffusion process has the following 
properties: 

Fidelity: u{-,t) as t 0 . 

Causality: Given 0 <to <T, u{-,T) is completely determined by u{-,to). 
Euclidean Invariance: If A is an isometry, then = Uus{-,t) o A . 

Diffusion filtering methods are closely related to (iterative) Tikhonov-type 
regularization methods. The implicit Euler method for the solution of ([Q) is 

= V • (g(|Vu| )Vu)(t„) . (2) 

^n— 1 

If there exists a convex function h satisfying h' = g, then the minimizer u of 

\\u-u{tr,-l)\\\2(Q)+{tn-tn-l) [ H\Vu\'^) ( 3 ) 

J 

provides an approximation of u(tn)- Using just one implicit time step a is known 
as the method of Tikhonov regularization. This method consists in minimizing 
the functional 

/Tik(“) ■= ■ (4) 

J f2 

Tikhonov regularization violates the causality principle: let TZ := {{«(•, a) : a > 
0} be the set of regularized images then it is not possible to calculate u{-, a) from 
the knowledge of m(-,q:i) for 0 < ai < a. Tikhonov regularization is a widely 
used method for solving ill-posed operator equations 

F{u) = y° . (5) 

where F is an operator which lacks a continuous inverse. The generalization of 
for solving o is to minimize the functional 



/f(m) := \\F{u) - 



+ a h(|Vup) . 



In 



( 6 ) 



For some survey references concerned with Tikhonov regularization for the so- 
lution of linear ill-posed problems we refer to |l!Ilfil2ll()ltilH ; for the solution of 
nonlinear ill-posed problems see mmm- 

The concept of diffusion filtering cannot be used for the solution of ill-posed 
operator equations. Arguing as before, the optimality criterion for the minimizer 
of (0 is F'{u)*^{F{u) — y^) = aV • (/i'(|V{tp)Vu) , where F'{u)*^ denotes the 
L^-adjoint of F'{u), i.e., F'{u)*^{v)w = vF'(u)(w) for all v G w G 

In the case of noise-free attainable data, i.e., for y^ = y^ = F{ij)), we 

have 

F'{u)*^{F{u) - F{u^)) = aV • {F{\Vu\^)Vu) 
and there exists an associated diffusion-type methodology 
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Due to the ill-posedness of the operator equation © there will generally not 
exist a solution of o when is replaced by yf y^ . The ill-posedness thus 
prohibits an a-priori estimation of an approximation of . Thus method m is 
inappropriate for calculating a scale space of an inverse problem. 

We find it convenient to use the abstract formulation of Tikhonov regular- 
ization (0. Let F : X ^ Y he a,n operator between two Hilbert spaces X 
and Y. Then the abstract setting of Tikhonov regularization requires minimiza- 
tion of the functional f{u) := ||T’(m) — y^Wy + , where || • ||jf and || • ||y 

denote the associated norms on X, Y, respectively. The optimality condition 
for a minimum of this functional is F' {F{u) — y^) + au = 0, where 
F'(u)*^^ denotes the adjoint of F'{u) with respect to the spaces X and Y, 
i.e., \f' v,w) ^ = {v,F'{u)w)y for all u G F and w £ X . The relation 
to diffusion filtering becomes apparent if we use F = I, Y = L^{f2) and let X 
be the Sobolev space of one time differentiable functions on 17, and use 

the i7^(l7)-seminorm on X: || • |j|- = || • |j^i := |V • p . In this setting the 
minimizer u of the Tikhonov functional satisfies 

du 

u — — a Ail = 0 on 17 , — = 0 on dfi . 

on 

This implies that in a formal sense F'{x)*^^ = —A~^F'{x)*^. 

The iterative Tikhonov-Morozov method is a variant of Tikhonov regulariza- 
tion for solving inverse problems. This method consists of iteratively minimizing 
the sequence of functionals 

fn{ii) ■=\\F{u) - y^]^^ + an\\u- Un-i\\^ n=l,2, ••• (8) 

If the functionals are convex, then the minimizers are characterized as the 
solutions of 

F\unY’^^{F{ur,)-y^) + an{un-Un-i) = h, n=l,2,--- (9) 

Typically in the Tikhonov-Morozov method one sets fto = 0. But any other 
choice is suitable as well. For example, a-priori information on the solution may 
be incorporated in the initial approximation uq. 

Taking = l/(t„ — t„_i) shows that and u„_i can be considered as 
approximations of the solution u of the asymptotic Tikhonov-Morozov filtering 
technique 

du 

FTu)*^^ (F(u) - yY + — = 0 , with u(0) = 0 . (10) 

at 

For F = I the identity mapping from into the L^(I7) the iterative 

Tikhonov-Morozov method generates minimizers of the functionals 

fn{u) '■= \\it — U -|- Qf„||u — iin-lW 71=1, 2, ••• . (11) 

Accordingly, the asymptotic Tikhonov-Morozov method consists in solving the 
differential equation of third order u — u^ = A^ with it(0) = 0. In we estab- 
lished the following properties of via semi-group theory causality. Euclidean 
invariance and 
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inverse fidelity: u{-,t) — > as t ^ oo . 

These three properties justify the name inverse scale space method for m- 
In Section El we discuss the asymptotic Tikhonov-Morozov method for de- 
blurring images. In this case T’ is a linear integral operator. For this particular 
model problem we can motivate preferences of different numerical methods in 
inverse problems and image processing. 



2 Deblurring with a Scale Space Method 



In this section we consider a problem of dehlurring data. We aim to recover a 
function on 17 = [0, 1]^ given (blurred) data 



y 



^ = Kv) + 



noise := 



|a; — y\\)u\y) dy + noise 



in 



on 17. To formulate the Tikhonov-Morozov method we have to specify a similarity 
measure for the data and an appropriate function space containing rtb 

In this section we restrict our attention to those in one of the following 
two spaces: 



1. the Banach space IF^’P(17), with p > 1, of functions u satisfying 

i/p 

\VU\P+LU\U\P ] 

In 



\\u\\w^,p := 



< oo , 



with an appropriate positive weighting parameter a; > 0. 

2. the space BV{f2) of functions of bounded variation. That is, the class of 
functions u satisfying 



i\\BV(n)-= (|Vu| -kw|u|) < oo . 

Jn 



For a function u G BV{f2) the term |Vu| has to be understood as a 
measure (see 0). 

An appropriate choice for the similarity measure is the L^(l7)-norm. Depending 
on a-priori information on it is instructive to study the Tikhonov-Morozov 
method in a variety of settings. 

— For G W^’^(f2), p > 1, the corresponding Tikhonov-Morozov method 
consists in minimizing the functional 

f]Yi,p{u) '■= \\Ku — y II -|- On ||fi ~ Wn-l • (12) 

— For G BV (n) the Tikhonov-Morozov method consists in minimizing 

fBv{u) ■■= \\Ku - /||i 2 (j 2 ) + Oln\\u - Un-l||sy(fi) , (13) 



Inverse Scale Space Theory for Inverse Problems 321 



Since the operator K is self-adjoint on L^{Q), the asymptotic Tikhonov-Morozov 
method in the = kP^’^-setting reads as follows 



K{Ku{x, t) — y^{x)) = {A — luI) — {x, t) for (x, t) G f2 x (0, oo) 
Oil 

— (x,t) = 0 for (x,t) G dSl X (0,oo) , m( 0) = 0 for a; G J7 . 

The minimize!' u of fw^’P h^is to satisfy 

K{Kur^ - y^) = a„|v • (||V(m - Un-iW-^V{u - - 

a„|w||u - Un-lV~‘^{u - Un-l) ■ 

Introducing the relation 

_ 2 1 
P {tn - 



(14) 



(15) 



(16) 



between the regularization parameters and the time discretization we derive the 
asymptotic Tikhonov-Morozov method on 



K{Ku 




du 



P-2 



du 



(17) 



For p = 1 the relation (I I bll degenerates, indicating that there is no asymptotic 
integro-differential equation for the Tikhonov-Morozov method on BV{f2). 

One of the most significant differences between diffusion filtering and itera- 
tive Tikhonov-Morozov regularization is that a small time step-size in the diffu- 
sion filtering method results in very large regularization parameters. This is not 
inconsistent with standard regularization theory since we consider an iterative 
regularization technique which uses the information of the previous iteration cy- 
cle. In our numerical simulations an exponentially decreasing sequence leads 
to more visually attractive image sequences. This in turn implies that the time 
steps tn of the diffusion filtering method are exponentially increasing. This com- 
pensates for the fact that in the beginning the diffusion process is rather strong 
and a small step size is required. As the diffusion progresses the image starts 
stagnating and a large time step size becomes appropriate. 

Typically in inverse problems the main objective of regularization techniques 
is to obtain a reasonable reconstruction which are resistant to noise. Moreover, 
especially for nonlinear ill-posed problems, the calculation of a scale can be 
significantly more expensive than just calculating one regularized solution. 



2.1 Numerical Simulations 



In this subsection we discuss the numerical implementation of the Tikhonov- 
Morozov method and present some numerical simulations for deblurring images. 
In the numerical simulations presented below we have used the kernel function 



k{t) 



p8 



for t G [— e,e] and k{t) = 0 otherwise . 
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For the numerical solution of the integro- differential equation da we discretize 
in time and use a finite element ansatz of products of linear splines on f2. Let 
v{-,tn) = Sij=o approximation of the solution of 11411 where 

Vij{x,y) = Vi{x)vj{y) and Vi is a spline of order 1 , i.e., Vi{j/n) = Sij for i = 
0, • • • , N and Vi is piecewise linear on [0, 1]. For the approximation of the time 
derivative of v we use a backward difference operator, i.e., ~ 

^{xffn) ■ Using a„ = 1/ (tn — tn- 1 ) the discretized system for an approximation 
of li 1 4ll at time requires solving the following linear equation for the coefficients 
Cij{tn+i) from given coefficients Cij(tn) 

^ ^ ) (/C -t“ (Xrffuj) — I y ^ ~\~ eXn ^ ^ (t^) 

for all l,k G {0, ••• ,N}. Here [ivi)x{vk)xivj)y{vi)y + uivtVkVjVi] and 

^ ~ [In ^ ^ In general the matrix /C is not sparse - it is only 
sparse if the essential support of the kernel function k is small. Thus in gen- 
eral the setup of /C is computationally expensive. For example, if k has infinite 
essential support, then the setup of the matrix for an image of size n x n re- 
quires 0{n^) operations. The solution of the unregularized equation 1 1 ISII (i.e., 
with = 0) is ill-conditioned. This becomes clear when the singular values 
of the matrix K, are calculated; most of the singular values are comparatively 
small. Errors in components of the data corresponding to singular functions with 
singular value near zero are then exceedingly amplified. Thus it is prohibitive to 
calculate the solution of the unregularized equation. 



Example 1. In the first example we aimed to reconstruct the pattern (left images 
in Figured) from the blurred and additionally noisy data (cf. Figured- Figures 
0 0 show the inverse scale space method for reconstructing the pattern from 
blurred data (cf. Figured). When the blurred data is additionally distorted with 
Gaussian noise the ill-posedness of the problems becomes apparent. Only for a 
relatively short time period is the reconstruction visually attractive. For t —>■ oo 
the reconstruction becomes useless. One of the major concerns in regularization 
theory is the estimation of appropriate regularization parameters needed to stop 
the iteration process before the image becomes hopelessly distorted by noise. For 
some references on appropriate stopping rules for the Tikhon ov-Morozov method 
we refer to mm- 



Example 2. Here we aim to compare the Tikhonov-Morozov method on El^{fl) 
and HU(J7). We have chosen a piecewise constant function on a rectangle as a 
paradigm of a function that is in BV{fi) but not in This has the effect 

that the reconstruction with the (asymptotic) Tikhonov-Morozov on al- 

ways has a blurry character (cf. Fig. 0). Figured shows the reconstruction with 
the Tikhonov-Morozov method on BV{Q). 
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Conclusions 

In this paper the iterative nonstationary Tikhonov-Morozov method and its 
asymptotic formulations for the solution of inverse problems were studied and 
diffusion filtering and regularization have been related. Different formulations 
of the Tikhonov-Morozov method have been numerically compared and several 
arguments explaining why scale space methods have so far not been used in 
general inverse problem theory have been presented. 
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Fig. 1. The test pattern, the blurred data with and without noise. From the 
blurred data we intend to recover the test patter. 




Fig. 2. Reconstruction from blurred data without noise by the inverse scale 
space method Cl 




Fig. 3. Reconstruction from blurred data with high noise using the inverse scale 
space method d 



Inverse Scale Space Theory for Inverse Problems 325 




Fig. 4. Test-data for comparing the Tikhonov-Morozov method on H^{Q) and 
BV (I7). The left image shows the data to be reconstructed; the right data shows 
the available blurred data. 




Fig. 5. Reconstruction with the Tikhonov-Morozov method on 




Fig. 6. Reconstruction with the Tikhonov-Morozov method on BV{Q) 
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Abstract. Image analysis methods that use histograms defined over non-zero- 
sized local neighbourhoods have been proposed [1-4], To better understand 
such methods, one can study the histograms of infinitesimal neighbourhoods. In 
this paper we show how the properties of such histograms can be derived 
through limit arguments. We show that in many cases the properties of these 
histograms are given by simple expressions in terms of spatial derivatives at the 
point analyzed. 



1 Introduction 

Assorted results, that build on previous work on the use of local histogram methods 
for image processing [2-5], are reported. The introduction reviews relevant back- 
ground material and establishes some formalism. The body of the paper contains new 
results on: averages and moments of local histograms, analytic solutions of the mode 
filtering equation, and information theoretic measures applied to local histograms 

1.1 Structure at a Poiut 

The observation that ‘structure at a point’ is seemingly essential for Physics and yet 
apparently incoherent can be traced back to Zeno’s discussion [6] of the paradox of 
how an arrow in flight could at any given moment be both motionless and possessing 
of a particular velocity. Proposed resolutions of this paradox include the method of 
limits [7], instrumental approaches to physical theory [8], and formalisations of infini- 
tesimals [9]. A modern formulation of the instrumental approach is the method of 
distributions [10] in which, in a logical positivist spirit, only the results of measure- 
ment are dealt with, rather than idealised physical quantities themselves [11]. Scale 
space analysis is a particular version of the distributional method, founded upon the 
use of Gaussian apertures as operators for measuring physical scalar functions [12]. 

1.2 Differential Point Structure 

In the scale space framework, as in the standard calculus of infinitesimal neighbour- 
hoods, local structure can be probed through analysis of derivatives at the point of 
interest [13-19]. In the scale space approach these values are obtained by application 
of derivatives of the aperture function. Raw derivatives are poor descriptors of image 
structure as they vary with the coordinate system. Instead, satisfactory descriptors can 
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be obtained through eombination of derivative values into coordinate system inde- 
pendent quantities such as the gradient magnitude . 

When the emphasis is on invariant quantities, it is convenient to employ gauge co- 
ordinates. In partieular, in vc,v-gauge eoordinates where the w-direction is uphill along 
the gradient and the v-direction is tangent to the isophote, the expression for the gradi- 
ent magnitude simplifies to . The vc, v-system will be used in what follows. 

1.3 Statistical Point Structure 

An alternative way to probe loeal strueture is consider the histogram of values visible 
within an aperture [2-4] (see figure 1). For a Gaussian aperture 

F-F 

( Gj (r) = (4;!^) e ) of scale s eentred on the origin, the histogram is given by 
[T](/>) := J Gj, (r).S (^L(r)-p'j (iF . In so far as this histogram depends upon the 

underlying image, it is dependent upon the image derivatives at the origin, but in a 
eomplex manner. The relationship does, however, simplify in the limit as the aperture 
size goes to zero [5]. 




Fig. 1. The left panel shows an example image region centred on a regular point. The central 
panel shows a gaussian aperture. The right panel shows the histogram of the image at left 
within the aperture shown in the centre. The mean, median and mode of the histogram are 
marked, as are the corresponding isophotes in the underlying image on the left. The median 
isophote divides the image into two regions with equal integral of the aperture weighting. 



As scale tends to zero, the histogram tends to a delta function (i.e. 
lim//j = which obscures analysis of its limiting form. To better 



study the limit, we can use the trick of fixing the aperture scale and resizing the image 
in the spatial and intensive dimensions. Following this procedure one discovers that in 
the zero scale limit, the local histogram is Gaussian in form with a width related to the 
gradient (i.e. the coordinate-invariant part of the order differential structure): 



lim 
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The first-order manner in which this histogram evolves as the scale is increased 
from 0 is captured by an expression that depends upon the second order structure: 





{K-2L^) + p\L^).G[. {^-^(O)) 



2 Averages and Moments of Local Histograms 



As has been presented elsewhere [5], from the equations in section 1.4 one can com- 
pute the mean ( [Z]j = z|oj-l- (Z,^ -I- 0 ^ 5 ^ )), median 

( /I, [Z,]j = ) ) and mode 

( ^0 [Z] j = L ( 0 ^ + (Z^ - 2L^^)s + o ) ) of local histograms for small s. It is also 

possible to calculate moment-based measures of local histograms for small s. For 
example, the variance (second moment) depends on the local gradient 

v(^s) = 2 LJs + , while the skewness (third moment normalized by the second) 



depends on the second derivative in the gradient direction k ( 5 ) = -I- o ^ . 

These results can be used for the troublesome task of estimating the median and the 
mode of sparse image histograms (such as result from 3x3 neighbourhoods). Consider, 



for example, the discretized aperture 



1 4 1 
4 16 4 

1 4 1 



[ 20 ], applied to a patch of image with 



values 



100 40 25 1 
79 44 13 

59 28 0 



, resulting in the histogram shown in figure 2. Using a standard ap- 



proach, the mean would be calculated to be 42.4, but both the median and mode 
would be 44. Instead the skew of the histogram can be calculated to be 12.6 and this 
can be used to adjust the mean and arrive at 40.3(= 42.4 -^x 12.6 ) for the median 

and 36.1(= 42.4-^xl2.6 ) for the mode. 

Alternatively, the skewness and the mean can be used to estimate Z^ and . To 
do this, the scale of the aperture must be known, but this can be readily calculated 

using the expression 5 = j m (x,y) dxdy ) which, applied to the mask 



above, gives an equivalent scale of s^=j. Thus ='^^,^ = 12.6 and 
=^(42.4-44- jl2.6) = -22.2 
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Fig. 2. The sparse histogram corresponding to a 3x3 aperture applied to a typical image region 
(see text above). The mean has been calculated in the normal manner. Under a literal interpre- 
tation of the histogram, the median and mode would both be located at the main spike; how- 
ever, the skew-based method described in the text places them at the marked locations. 



3 Image Simplification 

The concept of progressively simplifying an image is well established [21], The obvi- 
ous technique is iteratively to replace simultaneously the value at each point of the 
image by the average of the values within an aperture around the point. The effect of 
such filtering will depend upon the size and shape of the aperture. To remove this 
dependency one can to consider the limiting process as smaller apertures are used and 
the number of iterations is increased. In the limit, the apertures become infinitesimal 
and the iteration number is replaced by a continuous time parameter. At the limit, the 
effect of filtering can only be dependent on differential measures of the image and so 
infinitesimal filtering is always describable by a partial differential equation. 

3.1 Mean, Median, and Mode Filtering 

Different definitions of average, result in different filtering schemes [22-24]. Using 
the mean as the averaging operator results in a scheme equivalent to linear diffusion 
[12], described by - where is the time parameter. Using the median 

causes the image to evolve (at regular points) according to , also know as 

mean curvature flow [25]. Using the mode results in the image evolving according to 
A„ = ^vv ~'^^ww regular points, and = 0 at critical points [5], 

Mode filtering is distinctly different in effect from mean or median. As mode fil- 
tering progresses, the ~2L^ term has the effect of de-blurring and so enhancing 
edges, while the +L^ term stabilizes this process and prevents the developing loci of 
discontinuity from becoming too ragged. Away from developing edges, the image 
changes in value towards nearby critical points. The final result seems to be a mosaic 
of plateaus separated by discontinuities, though this has not been proved. At this point, 
the image is unaffected by further applications of the mode filtering procedure. 



330 Lewis D. Griffin 




Fig. 3 Shows the effect of mean (left column), median (centre) and mode (right) filtering. The 
first row shows the original image; the other rows reading downwards show the progressive 
effect of repeated filtering after 4, 16 and 64 iterations. The mode-filtered image at bottom right 
is the final state for this image; further mode filtering has no effect. 

Mean-filtering, being linear, is straightforward to implement; median- and mode- 
filtering, being non-linear, are not. Results from attempted numerical implementations 
of these filtering schemes are shown in figure 3. The qualitative effect of median fil- 
tering is similar to results presented by other authors under the guises of mean curva- 
ture flow [26] as well as median filtering [25]. For mode filtering, although the results 
are in rough qualitative agreement with expectation, the algorithm can only truly be 
assessed by comparison to explicit solutions of the evolution equations. In the next 
section we report progress on identifying such explicit solutions. 
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3.2 Analytic Solutions to the Mode Filtering Equations 

An exact solution to the mode filtering equation is provided by a family of erf func- 
tions < c) = erf |x(8(c-/|,)) Since these functions are ID, the mode fil- 

tering equation reduces to = ~2E^ , the truth of which is easily established. As this 
family evolves, a blurred step edge becomes increasingly sharp until at t,, = c it be- 
comes a sharp step edge ( E (x; > c) = sgn (x) ), after which point there is no further 
change. 

The above solution is of some use in evaluating our numerical implementation, but 
it does not feature any extrema. AID solution that does includes extrema, but is how- 
ever only approximate, is 5(x;t|, < O) = cos((2i -I- l)x) . As figure 4 

1=0 

shows, this describes the fonnation of a square wave (at t^=Q)) from a blurred square 
wave (at < 0 ). At regular points {x^ nn) the family of functions satisfy the evolu- 
tion equation = -2S^ exactly as required. Unfortunately, the extremum change 
value slightly with , which should not occur with proper mode filtering. The move- 
ment is small though (less than 0.1% for G [-0.05,0] ). 




Fig. 4. Shows a blurred square wave evolving into a shaip square wave. The evolution satisfies 
the mode filtering equations except at the extrema, which change value slightly. 

4 Information Theoretic Measures 

Image registration methods based upon information theoretic measures are becoming 
common [27, 28]. In this context, information measures are used to quantify the de- 
gree to which two images are in correct registration. Optimisation routines are then 
used to explore an appropriate space of transformations to discover the best possible 
registration according to the measure. 

At the heart of any information theoretic approach is Shannon’s measure (H) of the 
entropy of a distribution Z), defined as Z/[D] = -|Z)(j5)ln[D(/i)] [29]. Entropy 

measures something similar to variance, especially for unimodal distributions. For 
example, the entropy of a Gaussian distribution is (_)] = Y[l + ln(47T5)) . 
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For image registration purposes, a measure ‘mutual information’, that is defined in 
terms of entropy but that applies to joint distributions, is used. The joint distribution 
(J) of a pair ( 17, K : M ) of registered images is defined as 

7 (m,v) = |i2| ' (l7(r )-m )5 )- v) ; where Q. is the region of overlap. The 

a 

mutual information (M) of J measures the degree to which a value of one variable (e.g. 
u) of the joint histogram is predictable from the value of the other variable (v). Good 
registration corresponds to high mutual information i.e. the values of one image can be 
well predicted from the values of the other. The formula for mutual information is: 

M [j] = [7 (_,_)]-(//[ jj (m, _) [1 7 (_, v) 7v] j 

While mutual information gives good registration results most of the time, it has 
been observed that it can result in obvious mis-registration even with images not spe- 
cially constructed to fool the algorithm [30]. It has been suggested that this can be 
prevented by using a global mutual information measure combined by multiplication 
with a local measure integrated across the image domain [31]. We describe below how 
such local measures can be understood within an information theoretic framework. 



4.1 Mutual Information Defined for an Infinitesimal Aperture 



As a preliminary result, we first build on the equations given above for (i) the entropy 
of a Gaussian distribution, and (ii) mutual information of a 2D distribution. These 
equations allow one to calculate the mutual infonnation of a 2D gaussian distribution 



E{u,v) = ^4Axj^e^ 



-\Au^+2Buv+Cv \ , , 1 , 

-B^e ' Mo be =|ln 

M AC-B^ 

to consider two registered images {U and V), both regular and (for convenience) zero- 
valued at the origin i.e. U (x, y) = ax + by + ... and V (x, y) = ax + /3y + ... . Just as we 

can define the histogram of a single image within an aperture, so we can define the 
joint histogram within an aperture of a pair of images. For a Gaussian aperture of scale 
s centred at the origin, the joint histogram of U and V is: 

(m,v) = |Gj {f)5{u{f)-u^5{y{f)-v')dr 

In the limit, as we reduce the scale of the aperture down to zero, the joint histogram 
becomes dependent upon only the first order structure of the two images and the joint 
histogram tends to the fonn of a two-dimensional gaussian: 



AC 



. Our next step is 



, I + -{aa+bp)Vu 
y-[aa+hP) a^+h^ Iv 



MmH iu,v) = Y\m.{Ans\aB-ab\) e at) 

iio •' V ’ / ^4,0 V I 1/ 

So, using the already stated expression for the mutual information of a two- 
dimensional Gaussian one arrives at: 

^ / ? 1 7 \ / 7 . \\ 






= -iln 



(^afi-baf 
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Finally, if the angle between the two gradient vectors 



fa^ 


and 


'a' 


b 

V > 







then the expression for mutual information simplifies to; 

(“>'^)] = -ln(|sin0|) . 



is denoted 6 , 



This expression reminds us of the expression j(l + cos(20)) used as a measure of 

local registration quality in a recently proposed [31] improvement to standard global 
mutual information measures. The two measures are compared in figure 5. 




Fig. 5. Compares two measures of local registration. The sinusoidal dark curve is 
Y (1 + COS (20 )) , the paler curves are our alternative - In (|sin 0 1 ) . 



5 Concluding Remarks 

In this paper, we have defined local histograms as the limit of histograms defined for 
gaussian apertures as the scale of the aperture tends to zero. We have shown how 
properties of these local histograms are, in many cases, given by simple expressions in 
terms of spatial derivatives of the image at the aperture centre. 

We have given examples of the use of local histogram operations in three areas: 
image measurement, image filtering and image registration. In all three cases, the 
details of the actual implementation of these methods on a discrete grid of pixels is ad 
hoc. 

Several questions are raised by this approach. Consider, for example, the following. 
The equation that was presented for the first order deviation of the local histogram 
from gaussian form depended upon L^, and but not . Which other deriva- 
tives is the form of the local histogram unaffected by? Or to turn the question around, 
what characterizes the portion of the differential structure at a point that is discover- 
able from examination of the local histogram at the point? 
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Abstract. In this paper, we investigated the mechanism of dividing a 2D- 
object border into a set of local and global indentation and protrusion segments 
by extending the classic curvature scale-space filtering method. The resultant 
segments, arranged in hierarchical structures, can represent the object shape. 
Applying this technique, we derived a border irregularity measure for 
pigmented skin lesions. The measure correlated well with experienced 
dermatologists’ evaluations and may be useful for measuring the malignancy of 
the lesion. Furthennore, we can use the method to discover all the bays in an 
aerial map. 



1 Introduction 

Shape decomposition is an important technique for computer vision or image 
understanding systems. Dividing an object into parts forms a logical hierarchical 
structure of the part shapes, which can help us understand the object. 

There are many approaches to partition an object. Generalized-cylinders [1] and 
superquadrics [2, 3] methods model shape parts by predefined geometric primitives. 
Blum and Nagal [4] proposed to divide an object according to its symmetric axes. 
The high curvature points of an object border, which are considered to possess high 
information content [5], have also been used for shape decomposition. Hoffman and 
Richards [6, 7] partitioned an object border at the concave tips. Siddiqi and Kimia’s 
[8] neck-based and limb-based approach to object decomposition also put the 
terminals of part-lines at the concave tips. However, the above methods cannot 
produce a full set of indentation and protrusion segments. 

In this paper, we present an algorithm, which is an extension of the classic 
curvature scale-space filtering technique, to partition a 2D planar curve into two sets 
of local and global indentation and protrusion segments. Then we discuss two 
applications for such a boundary decomposition technique. The first application is to 
measure the border irregularity of a pigment skin lesion, which may indicate the 
malignancy of the lesion. Another application is to detect a set of bays, arranged in a 
hierarchical structure, from aerial maps. 

The paper is organized as follows: Sect. 2 briefly describes the classic curvature 
scale-space filtering technique. Sect. 3 defines indentation and protrusion segments. 
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Sect. 4 presents the algorithm of detecting all indentation and protrusion segments. 
Sect. 5 shows the duality between the classic and the extended curvature scale-space 
images. Sect. 6 discusses the two applications and Sect. 7 concludes the discussion. 



2 Classic Curvature Scale-Space 



The classic curvature scale-space filtering technique extracts curvature zero-crossing 
points from a 2D-object border in a multi-scale environment [9, 10]. The idea begins 
with a smoothing process of the object border L(t), which is parameterized by the path 
length variable t and is in C^. The smoothing process is achieved by a series of 
Gaussian convolutions with a family of kernels g{t, ct) of increasing a. The curvature 
function K{t, ct) of the smoothed border L{t, ct) is defined as:Q 



dX dX d^X dY 






dt dr dr dt 

SA 57 2 3 / 2 ' 

dt dt 



( 1 ) 



During the smoothing process, a controls the amount of smoothing. At some large 
CT, all concavities on the border are removed and the process is terminated. Fig. 1 
demonstrates the smoothing process of a planar closed curve. 



original border sigma =16 sigma = 40 sigma = 72 sigma =129 




Fig. 1. Gaussian smoothing process for a planar closed curve. The initial parameterization 
point is marked as ’x’ in each subfigure. The smoothing a level is specified at the top of the 
subfigure. The the a level when all concavities are removed, for this example is 129. 



For a smoothed border, the curvature zero-crossings are the points that satisfy the 
following conditions: 



K(t,a)=0, 



dK(t,a) 

dt 



( 2 ) 



Zero-crossings of all smoothed borders are computed and a 2D scale-space image is 
employed to record the captured feature points. Fig. 2a shows the classic curvature 
scale-space image for Fig. 1. 



' With our convention, using counterclockwise tracing along the border and image coordinate 
system (i.e. the origin is in the top-left corner), positive curvature values imply concavity, 
while negative curvature values imply convexity. 
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(a) Classic curvature scale space image 




(c) Overlaying the classic and extended curvature scale space Images 



(b) Extertded curvature scaled space Image 
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Fig. 2. The classic (a) and extended (b) curvature scale-space images for the Gaussian 
smoothing process shown in Fig. 1 . (c) The overlay of (a) and (b). 



The classic curvature scale-space image has been used to match 2D objects from a 
database [9, 10] and to detect corners from an image [11]. However, these systems 
are not designed to analyse indentation and protrusion curve segments. 



3 Definition of Indentation and Protrusion Segments 

To define indentation and protrusion curve segments, we exploit the characteristics of 
the curvature function K(t) of a curve function L{t). The curvature function portrays 
the curve in two ways. The sign of K{t) indicates the type of bending (concavity or 
convexity) at the point t and the magnitude denotes the amount of bending. Local 
curvature extrema, located by the zero-crossings of the first derivative of K(t) with 
respect to t {K\t)^Q, K"{t) ^ 0), mark the tip points of the curve segment, whose type 
is determined by the corresponding sign of K(t). Therefore, we define an 
indentation/protrusion segment as a curve segment composed of three consecutive 
local curvature extrema [L, t 2 , G]. The middle curvature extremum t 2 determines the 
segment tip point and the segment type. For example, when D is a concave curvature 
extremum, > 0, the corresponding segment is an indentation segment. The local 
curvature extrema ti and D delimit the extent of the segment. K{ti) and AT(G) have the 
same sign, which are different from K{t 2 ). 
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4 Algorithm for Detecting Indentation and Protrusion Segments 

The details of the algorithm for detecting all indentation and protrusion segments 
have been published elsewhere [12]; therefore, only an overview is presented here. 

To analyse indentation/protrusion segments, local curvature extrema are chosen to 
be the investigated feature of our extended curvature scale-space image. These 
curvature extrema are defined as the zero-crossings of the partial derivative oi K{t, a) 
with respect to t, i.e. 



dK{t,a) d^K{t,a) 

dt ’ 



( 3 ) 



Also, the scale-space image is extended from a binary image to a three-valued 
image to encode the concavity or convexity property of the curvature extrema. Such 
an extended scale-space image for the smoothing process of Fig. 1 is depicted in Fig. 
2b, where concave curvature extrema are denoted by shaded thick points (red in the 
online version) and convex curvature extrema are denoted by solid thin black points. 

From our extended curvature scale-space image, we can capture all 
indentation/protrusion segments as defined in Sect. 3 for the entire smoothing process. 
Furthermore, our scale-space image also reveals the evolution of the 
indentation/protrusion segments. Because of the causality property of Gaussian 
smoothing [13, 14], segments are smoothed out in a 'proper' order: small ones 
disappear before larger ones. When some smaller segments are smoothed out, they 
may merge into some larger segments. The larger segments are considered as the 
global segments to the smaller local ones. Hence, indentation/protrusion segments 
can be grouped into hierarchical structures. In addition, the true location of an 
indentation/protrusion segment can be pinpointed by coarse-to-fme tracking of the 
segment to the zeroth-scale, the original non-smoothed curve. 



5 Classic and Extended Curvature Scale-Space Images 

The classic and the extended curvature scale-space images form a dual space because 
these two images are constructed by different feature points of the smoothed borders. 
Fig. 2c depicts the overlay of Fig. 2a and 2b. In this section, we present the parallel 
properties and the differences for these two images. 

Property la: In classic curvature scale-space images, the apex of a contour arc is the 
point (t,^ such that K(x,lJ^0 and d K(x,iJ/d 

For any a in the internal of [0, in the classic curvature scale-space image, let the 
points ti and be the curvature zero-crossings at the two sides of the contour arc. 
Since K{ti, a) = K{t 2 , a) = 0 and W is a continuous function, according to Rolle's 
Theorem, there exists a point tj such that ti < ts< t 2 and dK{t}, a)ldt = 0 in K-t space. 



^ Note that the apex point (t,^) of a contour arc is not selected in the classic curvature scale- 
space process due to the definition of the process as expressed in Eqn 2. However, the 
property of the point can be derived. 
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At the smoothing level the points D, D and D merge together to the point x. 
Because K is continuous, AT(t,^)= 0 and dK{x,'Qldt^Q. 

Property lb: /« extended curvature scale-space images, the apex of a contour arc is 
the point (x, Q such that dK{xff)!dt^0 and 9^AT(x,^)/9t^=0.|] 

For any a in the internal of [0, in the extended curvature scale-space image, let 
the points D and t 2 be the curvature extrema at the two sides of the contour arc. Since 
dKfi, a)ldt = dK{t 2 , a)ldt = 0 and dK/dt is a continuous function, according to Rolle's 
Theorem, there exists a point D such that tj < t}< t 2 and a)/dt^ = 0 in dK/dt-t 

space. At the smoothing level the points t/, t 2 and D merge together to the point x. 
Because dK/dt is continuous, dK{x,Q/dt=0 and d^K{xff)!dt^^Q. 

Property 2a: In classic curvature scale-space images, excluding the apex point, one 
side of a contour arc has the property dK/dt > 0 and the other side of the contour arc 
has the property dK/dt < 0. 

Assume the contour apex is the point (x,^) in the classic curvature scale-space 
image. For any a in the internal of [0, ^ ) of the smoothing axis, let the points D and 
D be the curvature zero-crossings at the two sides of the contour arc. By definition, 
Kfi, a) = 0 and dKfi, a)ldt ^ 0. Without loss of generality, we assume dKfi, a)ldt 
> 0. In other words, K crosses the zero from below at ti in the K-t space. Because K 
is continuous, for K to cross the next zero at t 2 , K must crosses the zero from above, 
dK{t 2 , a)ldt < 0. Otherwise, there exists a curvature zero-crossing in between tj and 
t 2 , which contradicts the classic curvature scale-space process. Therefore, the partial 
derivatives of dKfi, a)/dt and 9W(D, o)/dt must have different sign. 

To complete our argument for the property, we have to show that if dK{tj, a)/dt > 
0, all curvature zero-crossings in the same side of the contour arc must have the 
property dK/dt > 0. Since dKfi, a)/dt > 0 and dK/dt = 0 only at the contour apex 
(x,^), moving along the contour arc from {t,, a) to (x,^) in the dK/dt surface cannot go 
to negative because dK/dt is continuous. Therefore, the curvature zero-crossings 
along the same side as D have the property dK/dt > 0. 

Property 2b: In extended curvature scale-space images, excluding the apex point, 
one side of a contour arc has the property d^KJdi > 0 and the other side of the 
contour arc has the property d^K/dt^ < 0. 

The argument is parallel to property 2a if we can show d^K/dt^ is a continuous 
function. Since the border Lq is C^, the smoothed border L{t, ct) and curvature K are 

and d^K/dt^ is C‘ . Therefore, d^K/dt^ is a continuous function. 

Assume the contour apex is the point (x,Q in the extended curvature scale-space 
image. For any a in the internal of [0, Q of the smoothing axis, let the points ti and D 
be the curvature extrema at the two sides of the contour arc. By definition, dKfi, 
<3)/dt = 0, d^Kfi, a)/dt^ ^ 0. Without loss of generality, we assume d^K{tj, a)/dt^ > 0. 
In other words, dK crosses the zero from below at ti in the dK/dt-t space. Because 
dK/dt is continuous, for dK/dt to cross the next zero at t 2 , dK/dt must cross the zero 
from above, i.e., d^K{t 2 , o)/df < 0. Otherwise, there exists a curvature extrema in 



^ Note that the apex point (x,^) of a contour arc is not selected in the extended curvature scale- 
space process due to the definition of the process as expressed in Eqn 3. However, the 
property of the point can be derived. 
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between ti and t 2 , which contradicts the extended curvature scale-space process. 
Therefore, d^K{ti, a)ldt^ and d^K{t 2 , a)/dt^ must have different sign. 

To complete our argument for the property, we have to show that if d^K{t], a)/ct^ > 
0, all curvature extrema in the same side of the contour arc must have the property 
d^K/dt^ > 0. Since d^K{tj, a)/dt^ > 0 and d^KIdt^ = 0 only at the contour apex (x,Q, 
moving along the contour arc from (L, a) to (x,^) in the d^KIdt^ surface cannot go to 
negative because d^KIdi^ is continuous. Therefore, the curvature extrema along the 
same side as ti have the property d^KIdt^ > 0. 

Property 3: In the contours of an extended curvature scale-space image, the points 
where the concave extrema and convex extrema meet are the zero curvature points. 

The curvature of a convex curvature extremum is less than 0 and the curvature of a 
concave curvature extremum is greater than 0; hence, the meeting point has the 
property of zero curvature. The Points Ac and A4 in Fig. 2b are the examples of such 
points. 

Even though the extended curvature scale-space process computes the locations of 
curvature extrema, some curvature zero-crossings can be easily identified using the 
three-valued scale-space image. However, there is no corresponding property for the 
classic curvature scale-space image. 

Property 4a: In classic curvature scale-space images, all curvature zero-crossings 
disappear at rstemr 

When a Gaussian smoothing process terminates at the object border is 

transformed into an oval shape with convex curvature for the entire border (i.e. K{t, 
<^ierm) < 0 fox all f); therefore, all curvature zero-crossings disappear. 

Property 4b: In extended curvature scale-space images, all curvature extrema may 
disappear (a special case of a circle) or at least 4 curvature extrema remain at Oterm- 

When a Gaussian smoothing process temiinates at a,erm, the object border is 
transformed into an oval shape with convex curvature for the entire border (i.e. K{t, 
<^ierm) < 0 fox all t). In a special case, K(t, Oterm) is a negative constant (i.e. a cixcle) 
and thexe will be no cuxvatuxe extremum. Otherwise, curvature extrema must exist. 
Since an ellipse has 4 curvature extrema, there must be at least 4 curvature remains at 
a,erm fox the oval shaped boxdex. 



6 Applications 

6.1 Differentiating Malignant Melanomas from Benign Nevi 

The indentation and protrusion segments obtained from an object border can be used 
to describe the object shape. One of the applications for the technique is to analyse 
the border irregularity of the 2D projection of an object. In particular, this technique 
has been used to measure the border irregularity of skin pigmented lesions, commonly 
known as moles, which may indicate the malignancy of the lesion [12, 15, 16]. 

Moles are mostly benign; however, some of them are malignant melanomas, the 
most fatal form of skin cancer. Benign moles usually have a round or oval shape with 
regular contour and uniform colour. Fig. 3a shows a typical benign nevus. On the 
other hand, malignant melanomas are usually described as enlarged lesions with 
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multiple shades of colours. Furthermore, their borders tend to be irregular and 
asymmetric with protrusions and indentations [17, 18]. Fig. 3b shows a malignant 
melanoma. 




(a) (b) 



Fig. 3. Pigmented skin lesion (a) Benign nevus (b) Malignant melanoma. 

Among the clinical characteristics (size, colour and irregular border shape) of skin 
pigmented lesions, border irregularity is one of the important clinical features 
differentiating benign nevi from malignant melanomas. There are two types of border 
irregularity: texture and structure irregularities. Texture irregularities are the small 
variations along the border, while structure irregularities are the global indentations 
and protrusions that may suggest either the unstable growth in a lesion or regression 
of a melanoma. An accurate measurement of structure irregularities is essential to 
detect the malignancy of melanoma [19]. 




tal 




tbl 



Fig. 4. The largest indentation (a) and protrusion (b) for a lesion border. 



Our extended curvature scale-space filtering technique can be used to measure the 
structure border irregularity of a pigmented skin lesion by locating a set of global 
indentation/protrusion segments along the border. An area-based index, called 
irregularity index, is generated to measure the severity of irregularity for each 
segment. We compare the area difference between the smoothed segment at the 
smooth-out sigma level and the original non-smoothed segment. The ratio of the 
affected area difference over the area of the smoothed object is used to define the 
irregularity index of a segment [12]. For example. Fig. 4 shows the affected area 
difference (shaded) of a segment in the smoothing process, between the lesion border 
(shown by the solid line) and a smoothed border (shown by the dashed line) at the 
smooth-out level for the largest indentation (a) or protrusion (b). The overall 
irregularity index is computed by summing all individual irregularity indices. 
Because all global irregular segments are analysed, the measure is sensitive to 
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structure irregularities. A user study showed that the overall irregularity index 
correlated well with experienced dermatologists' evaluations of the malignancy of a 
lesion. The preliminary results of the user study have been reported in [16]. 



6.2 Detecting Bays from Aerial Maps 

To analyse aerial maps, one may represent a bay of the landmass by an indentation 
segment of the coastline. When all local and global indentation segments of the 
coastline are computed and organized in a hierarchical structure, the hierarchical 
structure of the bays can be detected^ For example, the British coastline, shown in 
Fig. 5a, can be divided into 7 global bay areas, according to the algorithm discussed 
in Sect. 4. As we move down the hierarchical structure of one of the global bays in 
the west side of the British coastline as shown in Fig. 5b, smaller bays are discovered. 




Fig. 5. (a) British coastline, (b) Hierarchical stmcture of bays (highlighted) at the west side of 
the British coastline. 



7 Conclusions 

We presented an extended curvature scale-space filtering technique to partition a 2D 
planar-closed curve into a set of indentation and protrusion segments, which can be 
used to describe the shape of the object. The extended and classic techniques form a 
dual space and their similarities and differences have been compared. Two 
applications for the extended technique have been discussed. A stable border 
irregularity index for skin pigmented lesion can be derived. Preliminary results 
showed that the index correlated well with experienced dermatologists’ evaluations of 



A set of peninsulas can also be detected if the protmsion segments are analysed. 
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the malignancy of a lesion. Also, we can discover the bays of a coastline by 
computing all indentation segments and organized them into a hierarchical structure. 
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Abstract. Gabor Analysis is frequently used for texture analysis and 
segmentation. Once the Gaborian feature space is generated it may be 
interpreted in various ways for image analysis and segmentation. Image 
segmentation can also be obtained via the application of "snakes” or ac- 
tive contour mechanism, which is usually used for gray-level images. In 
this study we apply the active contour method to the Gaborian feature 
space of images and obtain a method for texture segmentation. We cal- 
culate six localized features based on the Gabor transform of the image. 
These are the mean and variance of the localized frequency,orientation 
and intensity. This feature space is presented, via the Beltrami frame- 
work, as a Riemannian manifold. The stopping term, in the geodesic 
snakes mechanism, is derived from the metric of the features manifold. 
Experimental results obtained by application of the scheme to test im- 
ages are presented. 



1 Introduction 

Gaborian approach to image processing and analysis has been motivated by 
biological principles of image representation at the level of the primary visual 
cortex. The Gabor framework has been extensively used over the last fifteen 
years for texture analysis and segmentation mm- 

The motivation for using Gabor filters in texture analysis is double fold. 
First, it appears as though simple cells in the visual cortex can be well modeled 
by Gabor functions [ 120 , and that the Gabor scheme provides a suitable repre- 
sentation for visual information in the combined frequency-position space Hi- 
Second, the Gabor representation is optimal in the sense of minimizing the joint 
two-dimensional uncertainty in the combined spatial- frequency space |S| • 

Once the Gabor feature space of an image is generated, it may be used for 
texture segmentation. The fundamental question is how to extract the features 
which enable us to discriminate between textures. Porat and Zeevi have pro- 
posed to extract six localized features that can describe textures: the first 

M. Kerckhove (Ed.): Scale-Space 2001, LNCS 2106, pp. 344-|^^2| 2001. 
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two moments of the spatial frequency, orientation of the spatial frequency and 
intensity information. The six resulting features were used for texture analysis 
and synthesis. 

Segmentation is another important task in image processing. Since the intro- 
duction of ” snakes” or active contours [B| , this method has been extensively used 
for boundary detection in gray-level images. In this framework an initial contour 
is deformed towards the boundary of an object to be detected. The evolution 
equation is derived from minimization of an energy functional, which obtains a 
minimum for a curve located at the boundary of the object. 

The geodesic or geometric active contours model m offers a different per- 
spective for solving the boundary detection problem; It is based on the obser- 
vation that the energy minimization problem is equivalent to finding a geodesic 
curve in a Riemannian space whose metric is derived from the image contents. 
The geodesic curve can be found via a geometric flow. Utilization of the Os- 
her and Sethian level set numerical algorithm m allows automatic handling of 
changes of topology. 

This snakes’ model was extended to the vector valued active contours to 
handle more complex scenery such as color images H3 and multi-texture images. 
Some recent related work includes the one of Paragios and Deriche m who 
generate the image texture feature space by filtering the image using Gabor 
filters. Texture information is then expressed using statistical measurements, 
and segmentation is achieved by application of geodesic snakes to the statistical 
feature space. Shah m developed and applied curve evolution and segmentation 
algorithms where anisotropic metrics were considered. Lorigo et al CSl used both 
image intensity and its variance for MRI image segmentation. 

It was shown recently that the Gaborian spatial-feature space can be de- 
scribed, via the Beltrami framework, as a 4D Riemannian manifold embedded in 
R® |H| . Based on this approach we aim to generalize the intensity based geodesic 
active contours method to the Gabor-feature space of images. The stopping term, 
in the geodesic snakes mechanism, is generalized and derived from the metric of 
the Gabor spatial-feature manifold. We treat the localized texture features sug- 
gested in uni as a multi-valued image and apply the geodesic snakes mechanism 
to it. 



2 Geodesic Active Contours 

In this section we review the geodesic and geometric active contours method in 
the context of gray-level images m 

Let C(q) : [0, 1] ^ be a parametrized curve, and let I : [0, a] x [0, b] 
be the given image. Let E{r) : [0, oo[^ be an inverse edge detector, so that 
E approaches zero when r approaches infinity. Minimizing the energy functional 
proposed in the classical snakes is generalized to finding a geodesic curve in a 
Riemannian space by minimizing: 

Lr = I E{\VI{C{q))\)\a{q)\dq. 



( 1 ) 
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The resultant evolution equation is the gradient descent flow: 

= Tl(|V/|)fcN-(VT;-N) N, (2) 

where k denotes curvature. 

Deflning a function U, such that C = {{x,y)\U{x,y) = 0), we may use the 
Osher-Sethian Level-Sets approach m and obtain an evolution equation for the 
embedding function U: 



dU{t) 

dt 



= |VC/|Div 






vu \ 
Wu\)- 



( 3 ) 



A popular choice for the stopping function A(|V/|) is given by: 



E{I) 



1 

1-h |V/|2’ 



but other functions have been considered as well. 



3 Feature Space and Gabor Transform 

The Gabor scheme and Gabor Alters have been studied by numerous researchers 
in the context of image representation, texture segmentation and image retrieval. 
A Gabor Alter centered at the 2D frequency coordinates {U,V) has the general 
form of: 



Hx,y) = g{x' ,y')exp{2Tri{Ux + Vy)) (4) 

where 

{x',y') = (a;cos((/)) -I- ?/sin(0), — a;sin((^) -I- ?/cos((^)), (5) 

the 2D Gaussian window is 

A is the aspect ratio between x and y scales, a is the scale parameter, and the 
major axis of the Gaussian is oriented at angle (f> relative to the x-axis and to 
the modulating sinewave gratings. 

The Fourier transform of the Gabor function is, accordingly : 

H{u,v) = exp 27r^cr^((u' — C/')^A^ -I- {v' — V')^)^ , (7) 

where, {u\ v') are rotated frequency axes and (C/', V) are the rotated coordinates 
of the central frequency. Thus, H{u,v) is a bandpass Gaussian with minor axis 
oriented at angle (j) from the u-axis, and the radial center frequency F is defined 
by : F’ = + with orientation 0 — arctan(y/C/). Since maximal reso- 

lution in orientation is desirable, the Alters whose sine gratings are cooriented 
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with the major axis of the modulating Gaussian are usually considered {4> = 0 
and A > 1), and the Gabor filter is reduced to: h{x,y) = g{x,y) exp{2TTiFx). 

It is possible to generate Gabor-Morlet wavelets from a single mother-Gabor- 
wavelet by transformations such as: translations, rotations and dilations. We can 
generate, in this way, a set of filters for a known number of scales, S, and orienta- 
tions K. We obtain the following filters for a discrete subset of transformations: 

hmn{x,y) = a (^) 

where {x',y') are the spatial coordinates rotated by ^ and m = 0...S — 1. 
Alternatively, one can obtain Gabor wavelets by logarithmically distorting the 
frequency axis M or by incorporating multiwindows m- In the latter case one 
obtains a more general scheme wherein subsets of the functions constitute either 
wavelet sets or Gaborian sets. 

The feature space of an image is obtained by the inner product of this set of 
Gabor filters with the image: 

IWrm(:^5y) — Rmn(^x<f y^ iJjyinix^y^ — K^x^y^) ^ (9) 



Next, we follow Porat and Zeevi HS| and extract six localized texture features 
from the Gabor feature space: dominant localized frequency (denoted MF), vari- 
ance of the dominant localized frequency (VF), dominant orientation (MT), 
variance of the dominant orientation (VT), mean of the local intensity (MI) 
and variance of the localized intensity level (VI). This selection is based on the 
assumption that the primitives of natural textures are localized frequency com- 
ponents in the form of Gabor elementary functions. Therefore, texture analysis 
takes the form of inner product or correlation of such primitives with textured 
images. 

The spatial frequencies are determined by the scale parameter a and a base 
frequency Fq as: Fm = Fq * o'", where m is an integer. The dominant localized 
frequency is given by: 



MF{x,y) = 



E™ 1 EEi Wmu{x, y)F^{x, y) 



YZiY.tiWmn{x,y) 

The variance of the localized frequency VF is 

EEi \Pm{x,y) - MF(x,y)\ 



VF{x,y) = 



m 



( 10 ) 



( 11 ) 



This feature represents the bandwidth of the localized spatial frequency. If it is 
normalized by the mean localized frequency we obtain a scale invariant feature. 



V FYiormalized{x y y) — 



VF{x,y) 



MF{x,y) 

The mean and variance of the orientation are defined by 

^ EEi EEi Wmujx, y)Tn{x, y) 
YZ,Y:=,Wmn{xyy) 



( 12 ) 



MT{x,y) 



(13) 
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VT(., „) = 

n 

where T„ = ^. 

The local mean intensity and its variance are extracted to complete the set 
of features. If the image contains smooth segments then the gray level informa- 
tion is the only way to separate these regions. The locality of these features is 
accomplished by averaging the intensity level using a filter equal in size to the 
Gabor filter used to generate the Gabor- feature-space: 

MI{x,y) = , (15) 



where A is the set of N pixels belonging to the area defined by the averaging 
filter window centered at (x,y) and I{x,y) is the intensity image. The variance 
of the intensity level is given by 



VI{x,y) 



- MI{x,y)\ 
N 



(16) 



4 Application of Geodesic Snakes to the 
Localized- Texture- Features-Space 

Application of the geodesic snakes mechanism to the localized texture feature 
space, derived from the Gabor space of images, is achieved by generalizing the 
inverse edge indicator function E, which attracts in turn the evolving curve 
towards the boundary in the classical and geodesic snakes schemes. A special 
feature of our approach is the metric introduced in the localized texture feature 
space, and used as the building block for the stopping function E in the geodesic 
active contours scheme. 

Sochen et al m proposed to view images and image feature space as Rie- 
mannian manifolds embedded in a higher dimensional space. For example, a 
gray scale image is a 2-dimensional Riemannian surface (manifold), with {x,y) 
as local coordinates, embedded in with (X,Y,Z) as local coordinates. The 
embedding map is {X = x,Y = y, Z = I{x,y)), and we write it, by abuse 
of notations, as (x,y,I). When we consider feature spaces of images, e.g. color 
space, statistical moments space, and the Gaborian space, we may view the 
image-feature information as a Wdimensional manifold embedded in a, N + M 
dimensional space, where N stands for the number of local parameters needed 
to index the space of interest and M is the number of feature coordinates. In 
our case, we may view the localized features image as a 2D manifold with lo- 
cal coordinates {x, y) embedded in a 8D feature space. The embedding map is 
(a;, y, ME, VF, MT, VT, MI, VI). 

MF,VF, MT,VT, MI,VI are functions of the local coordinates (x,y) and 
are the localized texture features, as described in the previous section. 

A basic concept in the context of Riemannian manifolds is distance. For ex- 
ample, we take a two-dimensional manifold E with local coordinates (cri,cr 2 ). 



Geodesic Active Contours Applied to Texture Feature Space 349 



Since the local coordinates are curvilinear, the distance is calculated using a pos- 
itive definite symmetric bilinear form called the metric whose components are 
denoted by (T 2 ): = g^^da^da'^, where we used the Einstein summa- 

tion convention : elements with identical superscripts and subscripts are summed 
over. The metric on the image manifold is derived using a procedure known as 
pullback m- The manifold’s metric is then used for various geometrical flows. 
We shortly review the pullback mechanism m- 

Let X : E ^ M he a,n embedding of S in M, where M is a Riemannian 
manifold with a metric hij and E is another Riemannian manifold. We can use 
the knowledge of the metric on M and the map X to construct the metric on 
E. This pullback procedure is as follows: 



f) f) V3 

= h,,{X{a\a^)) — ^, (17) 

where we used the Einstein summation convention, i,j = 1, . . . , dim(M), and 
( 7 ^,( 7 ^ are the local coordinates on the manifold E. 

If we pull back the metric of a 2D image manifold from the Euclidean em- 
bedding space (x,y,I) we get: 






1 + 4" 44 ^ 

Ixly 1 + / 



(18) 



The determinant of yields the expression : 1-1-4^ -I- 4 ^. Thus, we can rewrite 
the expression for the stopping term E in the geodesic snakes mechanism as 
follows: 

= 1 + |V/P = det(5^4 ■ 

The exact geometry of texture feature space is not known. Therefore, for simplic- 
ity, we assume it is Euclidean. Moreover, since we have no previous knowledge 
on the 2D feature-manifold metric, we assume that the distances measured on 
the 2D manifold are the same as those measured in the ?>D embedding space. 
Thus, we may use the pullback mechanism to obtain the following metric: 



{9yu) 



/ 1 + 

i + E.(c,4)V 



(20) 



where A = (MF,VF,MT,VT,MI,VI), Ci are regularization factors which 
account for the different physical dimensions for each parameter, and i goes over 
all members of this set. 

The metric g derived in this section is strictly used for the purpose of cal- 
culating an edge detector. It is used to measure distances on manifolds and its 
components indicate the rate of change of the manifold given a certain direction. 
Therefore, the determinant of the metric is used as a positive definite edge in- 
dicator: A large value indicates a strong gradient, while a small value indicates 
where the manifold is almost fiat. Thus, it is reasonable to set E to be the inverse 
of the determinant of 
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5 Results and Discussion 

In our application of geodesic snakes to textural images, we have used the 
mechanism offered by Lee 0 to generate the Gabor wavelets for five scales 
and eight orientations. In the geodesic snakes mechanism U was initiated to 
be a signed distance function 0- For simplicity, we have set the values of Ci 
discussed in section 4 to be 1. Following are the results for a few test im- 




Fig. 1. a. A synthetic image made up of 2D sinewave gratings containing back- 
ground and object of different orientations (left), b. The stopping function E 
(middle), c. The resultant boundary (right). 



ages. For the complete set of full size images and a demo see the web-page: 
http : //www-visl . technion. ac . il/scalespaceOl, 

In the first example (Fig.QJ the test image is a synthesized texture composed 
of vertical and horizontal lines. Application of the geodesic snakes algorithm re- 
sults in an accurate boundary. In the second example the test image is composed 
of two textures which differ in their scale (Fig. EJ. The resultant boundary is 
located at the interior of the circles rather than on their exact boundary. The 
reason for that might be that since the two textures are quite similar, the change 
of scale is noted by the Gabor filters only when they are properly located within 
the internal texture. 

Our next example is composed of two different textures taken from the Bro- 
datz album (Fig. OJ. Since these textures are characterized by small variations 
in their dominant scale and orientation the six localized features are submitted 
to non linear smoothing prior to the generation of the stopping term E which 
is also in turn smoothed by the same process. The non linear smoothing proce- 
dure used is the Beltrami flow as described in m- The degree of smoothing was 
empirically determined to obtain satisfactory results. 

We have shown that it is useful to extend the definition of the stopping term 
E used in the geodesic snakes scheme to features other than intensity gradients. 
In the original work the six localized features are used as vector components to 
determine distance between textures ESI. This allows for determining the mean 
value of these features for each texture. In the proposed segmentation process 
the localized features are calculated for each pixel and therefore hold a large 
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Fig. 2. a. A synthetic image made up of 2D sinewave gratings containing back- 
ground and object of different scales (left), b. The stopping function E (middle), 
c. The resultant boundary (right). 



degree of intra- variation. Thus, if the texture is not homogeneous we may need 
higher statistical moments such as kurtosis of the localized texture features. Yet, 
we have shown that this algorithm can be successfully applied to textures that 
are characterized by a small degree of intra- variation. 



I ria •■■aaifibia 1 ( 1 ,^ 
■ Ma aiaiaiBiaui attxaia 

r aiB BIBIB aiaaBiB iia >§«• 

UBiBIBBBia BI 4 IBiariferB'B BIB BIB 
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• i «lBkBIBt»IBI«IBBBIB BIBiaiB<>B 
hkiBIBIBBBIfl BIBBIBBa 
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• BlBIBa-BIB BIB. BIBiaiaiBiB«fe.4(B 
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Fig. 3. a. An image comprised of two textures taken from Brodatz album of 
textures 0 (left), b. The stopping function E (middle), c. The resultant bound- 
ary (right). 
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Abstract. The scale-space approach to the differential structure of color 
images was recently introduced by Geusebroek et al. m, based on the 
pioneering work of Koenderink’s Gaussian derivative color model [S|. To 
master this theory faster, we present the theory as a practical implemen- 
tation in the computer algebra package Mathematica for the extraction 
of color differential structure. Many examples are given, the practical 
code examples enable easy extensive experimentation. High level pro- 
gramming is now fast: all examples run with 5-15 seconds on typical 
images on a typical modern PC. 

1 Color Image Formation and Color Invariants 

Color is an important extra dimension. Information extracted from color is useful 
for almost any computer vision task, like segmentation, surface characterization, 
etc. The field of color science is huge |0|, and many theories exist. It is far beyond 
the scope of this paper to cover even a fraction of the many different approaches. 
We will focus on a single recent theory, based on scale-space models for the 
color sensitive receptive fields in the front-end visual system. We are especially 
interested in the extraction of multi-scale differential structure (derivatives) in 
the spatial and the wavelength domain of color images. What is color invariant 
structure? To understand that notion, we first have to study the process of color 
image formation. 

The light spectrum falling onto the eye results from interaction between a 
light source, the object, and the observer. Color may be regarded as the mea- 
surement of spectral energy, and will be handled in the next section. Here, we 
only consider the interaction between light source and material. Before we see 
an object as having a particular color, the object needs to be illuminated. After 
all, in darkness objects are simply black. The emission spectra 1{X) of common 
light sources are close to Planck’s formula 0 (NB: A in nm): 

h = 6.626176 c = 2.99792458 10®; k = 2.9979245810®; 

1[A_,T_] = 87Thc(10"®A)"® 

where h is Planck’s constant, k Boltzmann’s constant, and c the velocity of light 
in vacuum. The color temperature of the emitted light is given by T, and typically 
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ranges from 2500K (warm red light) to 10, OOOif (cold blue light). Note that the 
terms “warm” and “cold” are given by artists, and refer to the sensation caused 
by the light. Representative white light is, by convention, chosen to be at a 
temperature of 6500RT. However, in practice, all light sources between 10, OOOiti 
and 2500RT can be found. Planck’s equation is adequate for incandescent light 
and halogen. The spectrum of daylight is slightly different, and is represented 
by a correlated color temperature. Daylight is close enough to the Planckian 
spectrum to be characterized by a equivalent parameter. 

The part of the spectrum reflected by a surface depends on the surface spectral 
reflection function. The spectral reflectance is a material property, characterized 
by a function c(A). For planar, matte surfaces, the spectrum reflected by the 
material e(A) is simplified as the multiplication between the spectrum falling 
onto the surface 1{X) and the surface spectral reflectance function c(A): e(A) = 
c(A)Z(A). 

At this point it is meaningful to introduce spatial extent, hence to describe the 
spatio-spectral energy distribution e{x, y, A) that falls onto the retina. Further, 
for three-dimensional objects the amount of light falling onto the object’s surface 
depends on the energy flux, thus on the local geometry. Hence shading (and 
shadow) may be introduced as being a wavelength independent multiplication 
factor m{x,y) in the range [0,1]: e{x,y,X) = c{x,y, X)l{X)m{x,y). Note that 
the illumination Z(A) is independent of position. Hence the equation describes 
spectral image formation of matte objects, illuminated by a single light source. 
For shiny surfaces the image formation equation has to be extended with an 
additive term describing the Fresnel reflected light, see 0 for more details. 

The structure of the spatio-spectral energy distribution is due to the three 
functions c, I, and m. By making some general assumptions, these quantities 
may be derived from the measured image. Estimation of the object reflectance 
function c boils down to deriving material properties, the “true” color invari- 
ant which does not depend on illumination conditions. Estimation of the light 
source I is well known as the color constancy problem. Determining m is in fact 
estimating the shadows and shading in the image, and is closely related to the 
shape-from-shading problem. 

For the extraction of color invariant properties from the spatio-spectral en- 
ergy distribution we search for algebraic or differential expressions of e, which 
are independent of I and m. Hence the goal is to solve for differential expressions 
of e which results in a function of c only. 

To proceed, note that the geometrical term m is only a function of spatial 
position. Differentiation with respect to A, and normalization reduces the prob- 
lem to only two functions: e(x. A) = c(x, A)Z(A)m(x) e(x,A) ~ T ^ 

(indices indicate differentiation). After additional differentiation to the spatial 
variable x or y, the first term vanishes, since I only depends on A 



e [x_, A_] = c [x, 
d\e [x, A] 



d- 



e [x, A] 



A] 1[A] m[x]; 
//shortnotation 



ccxy — CxCy 
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The spatial derivative of the normalized wavelength derivative, after applying 
the chain rule, 



D 



D[e[x,y,A] , A] 



e[x,y,A] 
e exA ~ ex e\ 



//shortnotation 



is completely expressed in spatial and spectral derivatives of the observable 
spatio-spectral energy distribution. 

We develop the differential properties of the invariant color-edge detector 
£ = , where the measured spectral intensity e = e{x, y, A). Spatial derivatives 

of £, like |i|, contain derivatives to the spatial as well as to the wavelength 
dimension due to the chain rule. In the next section we will see that the zero-th, 
first and second order derivative-to-A kernels are acquired from the transformed 
RGB space of the image directly. The derivatives to the spatial coordinates are 
acquired in the conventional way, i.e. convolution by a spatial Gaussian kernel. 



2 Koenderink’s Gaussian Derivative Color Observation 
Model 

Spatial structure can be extracted from the data in the environment by measur- 
ing the N-jet of (scaled) derivatives to some order. For the spatial domain this 
has led to the family of Gaussian derivative kernels, sampling the spatial inten- 
sity distribution. These derivatives naturally occur in a local Taylor expansion 
of the signal. 

Koenderink proposed to take a similar approach to the sampling of the 
color dimension, i.e. the spectral information contained in the color. If we con- 
struct the Taylor expansion of the spatio-spectral energy distribution e{x, y, A) of 
the measured light to wavelength, in the fixed spatial point (xq, yo), and around 
a central wavelength Aq we get (to second order): 

Seriesf e[ xO, yO, A], {A, AO, 2} ] 
e[xO, yO, AO] -I- [ xO, yO , AO] (A — AO) -I- 
ie<^0,0,2) |-^Q_ yO, AO] (A - AO)^ -f 0[A - AO]^ 

A physical measurement with an aperture is mathematically described with 
a convolution. So for a measurement of the luminance a with aperture func- 
tion G{x,a) in the (here in the example ID) spatial domain we get: L(x', cr) = 
J_^L(x — y)G{y;a)dy where y is the dummy spatial shift parameter running 
over all possible values. For the temporal domain we get L{t; a) = L{t — 
s)G{s; a)ds where s is the dummy temporal shift parameter running over all 
possible values in time. Based on this analogy, we might expect a measurement 
along the color dimension to look like: L(A; a) = L{X — /r)G(/i; a)dy. where 
A is the wavelength and y, is the dummy wavelength shift parameter. 
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In the scale-space model for vision the front-end visual system has imple- 
mented the shifted spatial kernels with a grid on the retina with receptive fields, 
so the shifting is implemented by the simultaneous measurement of all the neigh- 
boring receptive fields. The temporal kernels are implemented as time- varying 
lateral geniculate nucleus (LGN) and cortical receptive fields. However, in order 
to have a wide range of receptive fields which shift over the wavelength axis in 
sensitivity, would require a lot of different photo-sensitive dyes (rhodopsines) in 
the receptors with these different — shifted — color sensitivities. The visual system 
has opted for a cheaper solution: The convolution is calculated at just a single 
position on the wavelength axis, at around Aq = 520nm, with a standard devia- 
tion of the Gaussian kernel of about a\ = 55 nm. The integration is done over 
the range of wavelengths that is covered by the rhodopsines, i.e. from about 350 
nm (blue) to 700 nm (red). The values for Aq and cto are determined from the 
best fit of a Gaussian to the spectral sensitivity as measured psychophysically 
in humans, i.e. the Heering model. 

So we get for the spectral intensity 

r-''max 

e(x, Ao;cro)= / e(x, A)G(A, Aq; (TA)dA. 

^min 



This is a ‘static’ convolution operation (i.e. inner product in function space). It 
is not a convolution in the familiar sense, because we don’t shift over the whole 
wavelength axis. We just do a single measurement with a Gaussian aperture over 
the wavelength axis at the position a. Similarly, the derivatives to A: 



5e(x, Ao) 
dX 



/■^max a"G(A,Ao,aA),, 

= ax e(x,A) — dX 



and 



d\{K,Xo) 2 /-^max ^^d^G{X,Xo,ax)^^ 
^ = 



5A2 



describe the first and second order spectral derivative respectively. The factors 
(Ta and cr^ are included for the normalization, i.e. to make the Gaussian spectral 
kernels dimensionless. 

In Fig. ntbe graphs of the ’static’ normalized Gaussian spectral kernels to 
second order as a function of wavelength are given. 

Golor sensitive receptive fields come in the combinations red-green and yellow- 
blue center-surround receptive fields. The subtraction of yellow and blue in these 
receptive fields is well modeled by the first order derivative to A, the subtraction 
of red and green minus the blue is well modeled by the second order derivative 
to A. Alternatively, one can say that the zero-th order receptive field measures 
the luminance, the first order the ‘blue-yellowness’, and the second order the 
‘red-greenness’. 

Note: the wavelength axis is a half axis. It is known that for a half axis (such 
as with positive-only values) a logarithmic parameterization is the natural way 
to ’step along’ the axis. E.g. the scale axis is logarithmically sampled in scale- 
space (remember the ’orders of magnitudes’), the intensity is logarithmically 
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gaussA [A_,(T_] = D [gauss [A,(t] , A] ; 
gaussAA [A_,(T_] = D [gauss [A ,cr] , {A, 2}] ; 
AO = 520; crO = 55; 




Fig. 1. The zero-th, first and second derivative of the Gaussian function with 
respect to wavelength as models for the color receptive field’s wavelengths sensi- 
tivity in human color vision. After [Koenderink 1998a]. The central wavelength 
is 520 nm, the standard deviation 55 nm. 



transformed in the photoreceptors, and the time axis can only be measured 
causally when we sample it logarithmically. We might conjecture here a better 
fit to the Heering model with a logarithmic wavelength axis. 

The Gaussian color model needs the first three components of the Taylor 
expansion of the Gaussian weighted spectral energy distribution at Aq and scale 
(Tq. An RGB camera measures the red, green and blue component of the incoming 
light, but this is not what we need for the Gaussian color model. We need a 
method to extract the Taylor expansion terms from the RGB values. 

An RGB camera approximates the GIE 1964 XYZ basis for colorimetry by 
the linear transformation matrix rgb2xyz, while Geusebroek et al. |2| give the 
best linear transform from the XYZ values to the Gaussian color model, i.e. 
matrix xyz2e: 



/ 0.621 0.113 0.194\ /-0.019 0.048 0.011 \ 

rgb2xyz = 0.297 0.563 0.049 xyz2e = 0.019 0.000 -0.016 

\-0.009 0.027 1.105/ \ 0.047 -0.052 0.000 / 

The resulting transform from the measured RGB input image to the sampling ’ la 
human vision’ is the product of the above matrices: colorRF=xyz2e . rgb2xyz. 

The Gaussian color model is an approximation, but has the attractive prop- 
erty of fitting very well into Gaussian scale-space theory. The notion of image 
structure is extended to the wavelength domain in a very natural and coherent 
way. The similarity with human differential-color receptive fields is more than a 
coincidence. 

Now we have all the tools to come to an actual implementation. The RGB 
values of the input image are transformed into Gaussian color model space, and 
plugged into the spatio-spectral formula for the color invariant feature. Next to 
the derivatives to wavelength we need spatial derivatives, which are computed in 
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the regular way with spatial Gaussian derivative operators. The full machinery 
of e.g. gauge coordinates and invariance under specific groups of transformations 
is also applicable here. The next section details the implementation. 



3 Implementation 

In Mathematica a color image is represented as a two dimensional array of 
color triplets. The RGB triples are converted into measurements through the 
color receptive fields in the retina with the transformation matrix colorRF 
defined in the previous section. To transform every RGB triple we map the 
transformation to our input image as a pure function at the second list level: 
observedimage = Map [Dot [colorRF, #]&, im, 2];. 




Fig. 2. The input image (left) and the observed images e, e\ and e\\ with the 
color differential receptive fields. Image resolution 228x179 pixels. 



The color image data set can be smartly resliced by a reordering Trainspose: 

obs = Transpose [observedimage , 2,3,1];. 

The resulting data set is a list of three scalar images, allowing us to access the 
measurements e, e\ and e\\ individually as scalar images (see Fig. EJ. 

We now develop the differential properties of our invariant color-edge detector 
E = pff, where the spectral intensity e = e{x,y,X). The derivatives to the 
spatial and spectral coordinates are easily found with the chainrule. Here are 
the explicit forms: 



£ := 



D[e[x,y,A] , A] 



e[x,y,A] 

shortnotation [ d^E , dy£ , d\£ ] 

CCa:A CCyA CCaA 
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The gradient magnitude (detecting yellow-blue transitions) becomes: 

Q = Simplify [ + {dySy ] 

Q // shortnotation 

/ (c ^x\ ^a) (c ^yA Cy ^a) 

V ^3 

The second spectral order gradient (detecting purple-green transitions) becomes: 

yy = Simplify [ + {dy,\£Y ] 

Finally, the total edge strength M (for all color edges) in the spatio-spectral 
domain becomes: 

Af = Simplify [ x/{dx£Y + {dy£Y + {dx,\Sy + {dy^xEY 1; 

As an example, we implement this last expression for discrete images. First we 
replace each occurrence of a derivative to A with the respective plane in the 
observed image rf (by the color receptive fields). Note that we use rf [ [nA+1] ] 
because the zero-th list element is the Head of the list. 

We will look for derivative patterns in the Mathematica expression for Af 
and replace them with another pattern. We do this pattern matching with the 
command / . (ReplaceAll). We call the observed image at this stage rf , without 
any assignment to data, so we can do all calculations symbolically first: 

Clear[rfO, rfl, rf2, cr] ; rf = {rfO, rfl, rf2}; 

Af = Af / . { Derivative [nx_, ny_, nA_] [e] [x, y, A] 
Derivative [nx, ny] [rf [ [nA+1] ] [x,y] , e[x,y,A]] 
:^rf[[l]] } // Simplify; 

Note that we do a delayed rule assignment here ( instead of — >) because we 




Fig. 3. The color-invariant Af calculated for our input image at spatial scale 
cr = 1 pixel. Primarily the red-green color edges are found, as expected, with 
little edge detection at intensity edges. Image resolution 228x179 pixels. 



want to evaluate the right hand side only after the rule is applied. We finally 
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replace the spatial derivatives with the spatial Gaussian derivative convolution 
gD at scale a: 

Af = Af /. { Derivative [nx_,ny_] [rfj [x, y] 

gD[rf ,nx,ny,cr] , rfl[x, y] rf 1, rf2 [x, y] rf 2} 

The resulting expression for the total edge strength can now safely be calculated 
on the discrete data (see Fig. 01. Equivalent expressions can be formulated for 
the yellow-blue edges G and the red-green edges W, the results of these detectors 
are given in Fig. 0 








o 


o 



Fig. 4. Left: original color image. Middle: The yellow-blue edge detector £ cal- 
culated at a spatial scale cr = 1 pixel. Note that there is hardly any red-green 
edge detection. Right: output of the red-green edge detector W. Image resolution 
249x269. 



4 Combination with Spatial Constraints 

Interesting combinations can be made when we combine the color differential 
operators with the spatial differential operators. E.g. when we want to detect 
specific blobs with a specific size and color, we can apply feature detectors that 
are best matching the shape to be found. We end the paper with one examples: 
locating PAS stained material in a histological preparation. This examples il- 
lustrates the possible use of color differential operators and spatial differential 
operators in microscopy. 

Blobs are detected by calculating those locations (pixels) where the Gaus- 
sian curvature Igc = L^^Lyy — on the black-and-white version (imbw) of 
the image is greater then zero. This indicates a convex ’hilltop’. Pixels on the 
boundaries of the ’hilltop’ are detected by requiring the second order directional 
derivative in the direction of the gradient {L^L^x + 2LxLxy + LyLyy) / (L^ -|- Ly) 
to be positive. Interestingly, by using these invariant shape detectors we are 
largely independent of image intensity. For the color scheme we rely on £ and 
its first and second order derivative to A. 

In Fig.Elwe detect stained carbohydrate deposits in a histological application 
using this combined color and spatial structure detection mechanism. 

The Mathematica functions not described in the text and the images are 
available from the first author. 
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Fig. 5. Detection of carbohydrate stacking in the mucus in intestinal cells, that 
are specifically stained for carbohydrates with periodic acid Schiff (P.A.S.). 
The carbohydrate deposits are in magenta, cell nuclei in blue. The blob-like 
areas are detected with positive Gaussian curvature and positive second or- 
der directional derivative in the gradient direction of the image intensity, the 
magenta with a boolean combination of the color invariant £ and its deriva- 
tives to A. Scale = 4 pixels. Example due to P. Van Osta. Image taken from 
http ; //www . bris . ac . uk/Depts/PathAndMicro/ CPL/pas . html . 
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Abstract. We propose image enhancement, edge detection, and seg- 
mentation models for the multi-channel case, motivated by the philoso- 
phy of processing images as surfaces, and generalizing the Mumford-Shah 
functional. Refer to http : //www. cs .technion. ac . il/~sova/canada01/ 
for color figures. 



1 Introduction 

We provide a general variational framework for color images, generalizing the 
Mumford-Shah functional. Our goal is to provide a theoretical background for 
the model proposed and implemented in 

In Section El we give a review of variational segmentation and color edge 
detection. Section 0 offers a summary of the theory of the Mumford-Shah func- 
tional and of numerical minimization methods devised for this functional. We 
propose two generalizations of the Mumford-Shah functional in Section 0 and 
show some numerical results. 

A few words on the notation. The norm | ■ | is the usual Euclidean norm 
of any object: a number, a vector, or a matrix. In particular, for a function 
M : R" ^ R"* we put |Vu| = i® ^^e Lebesgue measure on R”. 

is the (n — l)-dimensional HausdorfF measure, which is a generalization of 
the area of a submanifold. 

2 Image Segmentation 

We consider images as functions from a domain in R^ into some set, that will 
be called the feature space. When needed, we suppose that the domain is [0, 1]^. 
Some examples of feature spaces are [0,255] or [0,oo), for gray-level images, or 
[0, 1]^ for color images in RGB. 

In |28| Mumford and Shah suggested segmenting an image by minimizing a 
functional of the form Vujp -I- a\\u — wjp) -I- /? length (AT), where K is the 

union of edges in the image. Thay conjectured that there are minimizers over 

M. Kerckhove (Ed.): Scale-Space 2001, LNCS 2106, pp. 3624^2^ 2001. 

© Springer- Verlag and lEEE/CS 2001 
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u G C^{n\ K) and K being finite union of smooth arcs, and decribed possible 
conjectured configurations for endpoints and crossings in K . 

The minimization of the Mumford-Shah functional poses a difficult problem, 
both theoretical and numerical, because it contains both area and length terms 
and is minimized with respect to two variables: a function / : 17 — > R and a set 
K C n. This kind of functionals was introduced in CHI. 

The usual mean of providing coupling between the color channels is by defin- 
ing a suitable edge indicator function e, that is supposed to be small in the 
smooth parts of the image and large in the vicinity of an edge. A typical exam- 
ple is e(x) = |V/(a;)p, and the integral / e usually constitutes the smoothing 
term. 

One of the promising frameworks to derive and justify edge indicators is to 
consider images as embedded manifolds and to look at the induced metric for 
qualitative measurements of image smoothness. This idea first appeared in El, 
and was extended in m- 

This interpretation was formulated in the most general way and implemented 
in m- The n-dimensional m-valued image is considered as an n-dimensional 
manif old in given by (xi, . . . , a:„, /i(a:i, . . . , a;„), . . . , /m(a;i, . . . , a:„)), 

and .y/det g, where g is the metric on the manifold induced by the metric h from 
is taken to be the edge indicator function. The integral J v^deFg gives the 
n-dimensional volume of the manifold, and its minimization brings on a kind of 
non-isotropic diffusion, which the authors called the Beltrami flow. 

As pointed out in EH, when implementing such a diffusion, one must decide 
what is the relationship between unit lengths along the Xi axes and along the fj 
axes. The significance of the ratio of the scales is discussed in detail in EH • 
will denote this coefficient by 7 . 

In the case of gray- level images this framework was first introduced in |22| . 
Here the image is a surface in the edge indicator is the area element (1-1- 
fx + the flow is closely related to the mean curvature flow. 

In a number of works (e.g. EH EH) another problem is considered, leading to 
very similar equations. It is the problem of smoothing, scaling and segmenting 
an image in a “non-flat” feature space, like a circle, a sphere or a projective line. 

Other related formulations and models for color images were proposed in 

H 0 EOl Cni El 13 El 

3 Mumford-Shah Functional 

The core difficulty in proving this conjecture is that the functional is a sum of 
an area and a curvilinear integrals, and the curve of integration is one of the 
variables. 

There is yet another problem, namely that the proposed domain of u and 
K is too restrictive and lacks some convenient properties (compactness, lower 
semicontinuity of the functionals in question). This one, however, is ordinary; 
instead of imposing that AT is a finite union of smooth arcs, we should drop this 
requirement, and prove later that a minimizing K must be smooth. This also 



364 



Alexander Brook, Ron Kimmel, and Nir A. Sochen 



necessitates replacing length(AT) with something defined on non-smooth sets; 
the most natural replacement is Ti^{K), the one-dimensional Hausdorff measure 
of K. 

The crucial idea in overcoming the difficulties of interaction between the area 
and the length terms is to use a weak formulation of the problem. First, we let 
K be the set of jump points of u: K = Su- The functional thus depends on u 
only. Second, we relax the functional in L^, that is, we consider 

E{u) = inf{liminf F;(ufe,S'„J : Uk ^ u,Uk € C^{f2 \ \ 5'^^) = 0}. 

k — >^oo 

It turns out (see 0) that this functional has an integral representation 

E{u)= [ {\Vu\^ + a\u-w\^) + (3n^{Su) 

J n 

and if E{u) is finite then u G SBV, the space of special functions of bounded 
variation. For the definition of SBV, and also of BV, GBV, and GSBV spaces 
we refer the reader to the book 

In this weak setting it was shown in m that there are indeed minimizers 
of E and that at least some of them are regular enough (with K closed and 
u G C^{f2 \ K)). Actually, it was proven for the more general case of 17 C K", 
n ^ 2, and F{u) = /q(|VuP -I- a\u — wp) -I- 

An interesting and important limiting case of the Mumford-Shah functional 
is the problem 

F = [ a\u — w\'^ + Vu = 0 on 17 \ 5'„ 

J n 

of approximating g by a piecewise-constant function. For this functional, the 
Mumford-Shah conjecture was proved already in the original paper I2SI; an ele- 
mentary constructive proof can be found in m- Existence of minimizers for any 
n ^ 2 was shown in PH- 

The main difficulty that hampers attempts to minimize the Mumford-Shah 
functional E{u,K) numerically is the necessity to somehow store the set K, 
keep track of possible changes of its topology, and calculate it’s length. Also, the 
number of possible discontinuity sets is enormous even on a small grid. 

We can, however, try to find another functional approximating the Mumford- 
Shah functional that will also be more amenable to numerical minimization. The 
framework for this kind of approximation is T-convergence, introduced in m 
(also see the book iia)- 

Gonsider a metric space {X,d). A sequence of functionals Fi : X ^ K_|_ is 
said to F-converge to F : X ^ R+ (F-limFi = F) if for any f G X 

yfi^ f ■■ liminf Fi{f,) > F{f) and 3f, ^ f : limsupFj(/j) < F{f). 

We can extend this definition to families of functionals depending on a continuous 
parameter £ | 0, requiring convergence of F^^ to F{x) on every sequence Si J, 0. 
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It is important to notice that _T-limit depends on what kind of convergence 
we have on X. Sometimes, to avoid ambiguities, it is designated as r(X)~ or 
_r(c?)-limit. For us, the most important property of F-convergence is that if 
F-limFi = F, fi minimizes Fi and /i — > /, / minimizes F. 

We come back to the task of approximating the Mumford-Shah functional by 
a nicer functional. However, we can not approximate F{u) with functionals of the 
usual local integral form F^{u) = /e(Vu, u) for u G (see p. 56]). One 
of the possibilities to overcome this is to introduce a second auxiliary variable, 
which was done in 

The approximation proposed in ^ is 



Fe{u,v) = [ ' 

Jn . 



v^\Vu\^ + P 



4e 



■ e|Vu|' 



+ alu — w[ 



dx. (1) 



The meaning of v in this functional is clear — it approximates 1 — xs„, being 
close to 0 when |Vm| is large and 1 otherwise. This functional is elliptic and 
is relatively easy to minimize numerically. A finite-element discretization was 
proposed in , with a proof that the discretized functionals also F -converge to 
F{u) if the mesh-size is o(e). 

Other works, suggesting approximations and numerical methods for the 
Mumford-Shah functional are fTTI E!7I FTI 1771 0 inj . 



4 Generalizing Mumford-Shah Functional to Color 



The most obvious way to generalize the Mumford-Shah functional to color images 
It : 17 ^ is to use 



F{u)= [ i\Xu\^ + a\u-wf) + PH'^-\Su). 

Jn 



( 2 ) 



In this case the only coupling between the channels is through the common jump 
set Su- The approximation results from Section 0 translate to this case without 
change (as noted in and we can use the elliptic approximation 



Fe{u,v) = 



In . 



v‘^\Xu\‘^+/3 



4e 



- £|Vr 



-I- a\u — w[ 



dx, (3) 



to find minimizers of F{u). We minimize F^ by steepest descent. A result of 
numerical minimization is shown in Figure D The original image was a noisy 
color image also shown in Figure D 

We want to generalize the Mumford-Shah functional 



(|Vu( 



a\u 



w\^) + PH^-\Su) 



to color images, using the “image as a manifold” interpretation, while the length 
term TF~^{Su) remains the same. 
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Fig. 1. Results with e = 0.05, a = 0.7, (3 = 0.022, t = 20 and the original image. 



The fidelity term that is most consistent with the geometric approach would 
be the Hausdorff distance between the two surfaces, or at least d{u{x) , w{x)) , 

where is the geodesic distance in the feature space, as in 0. Yet, both 

these approaches seem computationally intractable. The suggestion of /q ||u — 
u>||^, made in [ 2 ^, (here hij is the metric on the feature space, and |j • |j/i is 
the corresponding norm on the tangent space) is easy to implement, but lacks 
mathematical validity: u — w is not in the tangent space. We will use the simplest 
reasonable alternative, \u — w\'^. 

The smoothing term is the area -^/det g or the energy det g, where g is 
the metric induced on the manifold by h — the metric on the feature space. In 
the case where h is a Euclidean metric on 17 x [0, 1]^ and 

U{x,y) = (x,y,R,G,B) = {x,y,u^{x,y),u^{x,y),u^{x,y)) 
is the embedding, we get 

det g = det{dU* oho dU) = 7 ^ + 7 ^ |Vit*p + ^ |Vm* x 

i id 

= 7^ + 7(|wxp + |%n + \ux X = 7^ + 7|Vup + \ux X Uyl"^. 

Thus, we have two models: 

F^{u) = [ + 7 |Vup + \ux X Uy\'^ + a [ |u - wp + 

Jn ^ Jn 

F^{u) = f ( 7 |Vrtp + \ux X Myp) + a f \u - w\'^ + 

J n J n 

Note that 7 ^ was dropped in the second functional, since in this case it merely 
adds a constant to the functional. 

However, the theory of functionals on SBV or GSBV seems to be unable 
to deal with these models at the moment. It is necessary to establish lower 
semicontinuity of the functionals, both to ensure existence of minimizers, and 
as an important component in the T-convergence proofs. Though, theorems on 
lower semicontinuity of functionals on SBV exist only for isotropic functionals 
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(depending only on |Vu| and not on Vu itself), or at least functionals with 
constant rate of growth, i.e. c| Vu|’' < /(Vu) < C{1 + | Vu|)’' for some C > c > 0 
and r > 1. The term Imj, x is of order |Vu|^, yet we can not bound it from 
below by c|Vu|^ for some c > 0, therefore we can not use these theorems. 

The role of the term \ux'XUy\'^ is explored in M- If we assume the Lambertian 
light reflection model, then u{x,y) = {n{x,y) ■ l)p{x,y), where n{x,y) is the 
unit normal to the surface, I is the source direction, and p(x, y) captures the 
characteristics of the material. Assuming that for any given object p{x, y) = p = 
const we have u{x,y) = {n{x,y) ■ l)p, hence Imu C spanjp} and rankdu ^ 1. 
This is equivalent to Ux x Uy = 0. 

Thus, the term \uxXUy\^ in the edge indicator enforces the Lambertian model 
on every smooth surface patch. It also means that taking rather small 7 makes 
sense, since we expect \ux x Uy\^ to be (almost) 0, and |Vup to be just small. 

A generalization of the Mumford-Shah functional proposed here is an attempt 
to combine the nice smoothing-segmenting features of the geometric model with 
the existing F -convergence results for the elliptic approximation of the original 
Mumford-Shah functional. We pay for that by the loss of some of the geometric 
intuition behind the manifold interpretation. The proposed models are just 
and with \ux x Uy\'^ replaced by \ux x Uy\, that is 

+ \ux X Uy\) + a [ \u-wf+/3H"~\Su), 

J Q J f2 

G^(rt) = [ + 7|Vup -I- \ux xuy\+a [ |m - -I- /37Y”“^(5'„). 

Jn ^ Jn 

Note that \ux x Uy\ enforces the Lambertian model, just as \ux x Uyp. 

The new functional seems to violate another important requirement, nec- 
essary for lower semicontinuity with respect to convergence: being quasicon- 
vex. Besides, since the smoothing term is of linear growth, approximation similar 
to those in Section 0 will converge to a functional with more interaction between 
the area and the length terms, and depending on the Cantor part of Du. We 
thus propose the functional 

G^u)= f x/7 +|Vu |2 + a [ \u-w\^+/3[ dn^~^ + \D^u\{n). 

Jn Jn JSu 

The elliptic approximation for is provided in m-- 
Gl(u,v) = J u^(7|Vup -I- \ux X Uy\) -I- /3^e|Vup -I- ^ 

Results of numerical minimization by steepest descent are shown in Figure El 

A functional similar to the Mumford-Shah functional, but with linear growth 
in the gradient is examined in Q , and it is proved in particular that F-lim Gg = G 
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Fig. 2. Results for with e = 0.25, a = 0.05, f3 = 0.002, 7 = 0.01, t = 20. 



(with respect to convergence), where 



Ge{u,v) = [ ' 

Jn . 



v^f{\^u\)+ 13 ( e|V?;| 



4e 

if u,'!; G H^{Q) and 0 ^ < 1 a.e., and +00 otherwise, 

G{u,v)= [ f{\Vu\) + ^ [ + 

\J i~2 t/ Su 

if u G GBV(17) and n = 1 a.e., and +00 otherwise. 



and / : [0,+oo) ^ [0,+oo) is convex, increasing, and lim^^oo f{z)lz = 1. With 
the aim of generalizing this result to color images we define f{z) = \/"f + 



G^{u)= / 1/7 + |VuP + a / |u — wp+/3 
J n Jo 

Gl(u,v) = j 

Jo 



+ \D^u\{Q), 



+ |Vup + q;|m — zap + /3( e|Vz;p + 



dH^-^ 

(1-vr 



4e 



with domains as above. 

Upon inspection of the proofs in Q, it seems that everything remains valid 
for the vectorial case, except one part, that establishes the lower inequality for 
the one-dimensional case (n = 1 ) in a small neighborhood of a jump point. 
We can, however, provide a “replacement” for this part (the second part of 
Proposition 4.3 in P, beginning with (4.4)). 

A result of numerical minimization of G^ is shown in Figure 01 
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Abstract. In this paper, we describe and compare two multiscale color 
segmentation schemes based on the Gaussian multiscale and the Per- 
ona and Malik anisotropic diffusion. The proposed segmentation schemes 
consist of an extension to color images of an earlier multiscale hierarchi- 
cal watershed segmentation for scalar images. Our segmentation scheme 
constructs a hierarchy among the watershed regions using the principle of 
dynamics of contours in scale-space. Each contour is valuated by combin- 
ing the dynamics of contours over the successive scales. We conduct ex- 
periments on the scale-space stacks created by the Gaussian scale-space 
and the Perona and Malik anisotropic diffusion scheme. Our experimen- 
tal results consist of the comparison of both schemes with respect to 
the following aspects: size and information reduction between successive 
levels of the hierarchical stack, dynamics of contours in scale space and 
computation time. 



1 Introduction 

Color image segmentation refers to the partitioning of a multi-valued image into 
meaningful objects. The additional information, which is provided by color along 
with the continuously increasing number of applications that deal with analysis 
tasks of color images, advocate its prominent position among the interests of the 
image processing community. Recent segmentation methods tend to take the 
multiscale nature of images into account, which allows the integration of both 
the superficial and the deep image structure. The majority of the existing mul- 
tiscale schemes employ a Gaussian scale-space and subsequently suffer from its 
inherent drawbacks such as the correspondence problem: Edges at coarser scales 
are displaced, which results in the need to trace them to the original image in 
order to find their exact location. For this reason, non-linear scale-space were 
investigated. Currently, the use of anisotropic scale-space methods, which do not 
suffer or suffer less from the correspondence problem and thus allow the immedi- 
ate localization of the edges, is increasing. Its numerical restrictions to preserve 
stability and its architectural properties often led to unacceptable computation 
times in the past. This problem has been resolved by better numerical methods 
[14] . Most of the existing segmentation schemes employ the anisotropic diffusion 
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as an enhancement step. In [12] an image enhancement based on the anisotropic 
diffusion is used before applying a hierarchical watershed segmentation. A simi- 
lar approach is given in [6] for color images. In [2] a watershed-pyramid, which 
is based on a morphological scale-space, is used to automatically segment multi- 
valued images. In this technique, the watershed is applied to a coarse level where 
after the edges are propagated down to the finer layers of the pyramid. 

In this paper, we extend, to color images an earlier multi-resolution scheme 
[10] which is based upon principles of the watershed analysis and the Gaussian 
scale-space. The main motivation was the duality of the catchment basins in the 
gradient magnitude image along with the simplification processes which occurs 
during the creation of scale-space. In our approach we examine the multiscale 
behavior of the catchment basins and the gradient watersheds. The dynamics of 
contours [1] is used for the valuation, over the successive scales, of the watershed 
lines. An extension for color images of the Perona and Malik anisotropic diffu- 
sion [9] is proposed. Furthermore, the numerical scheme given by Perona and 
Malik in [9] is replaced by the Additive Operator Splitting (AOS) scheme [14]. 
The linear and non-linear multi-resolution schemes are compared on the fol- 
lowing aspects: (i) the size and the information reduction between successive 
levels of the hierarchical stack, (ii) dynamics of contours in scale space (iii) the 
computation time. 

The paper is organized as follows: In section 2 we present an overview of the 
hierarchical multi-resolution segmentation scheme and explain the both scale- 
space generators and their implementation for color, the dynamics of contours in 
scale-space and the hierarchical scheme. The comparative study of the Gaussian 
- with the Perona and Malik based multiscale segmentation is explained and 
illustrated with experimental results in section 3. And finally, some conclusions 
are made in section 4 concerning the performance of both methods and the 
continuation of the research. 

2 Multiscale Watershed Segmentation Scheme 

2.1 Introduction 

Similar to the grey level case, the segmentation of color images using the wa- 
tershed transformation can be translated as elimination of its main drawback, 
namely over-segmentation. Without being an exception to the rule of the grey 
level case, the oversegmentation problem in color images has been treated by 
following for main approaches: markers [7], flat zones[A\ and waterfall [5], and 
dynamics of contours [8] . 

In this paper we present a hierarchical segmentation scheme using dynamics 
of multiscale color gradient watersheds [10]. A hierarchical segmentation of an 
image is a tree structure by inclusion of connected regions. In our approach the 
tree structure construction follows a model which consists of two modules (see 
Figure 1): the salient measure module and stopping criterion module. The first 
module is dedicated to valuate each contour arc with a salient measure while the 
second identifies the different hierarchical levels by using a hypothesis testing. 
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Utilizing this algorithm enables us to construct a hierarchy among the regions 
which are produced by the gradient watersheds, integrating three types of infor- 
mation into a single algorithm, namely homogeneity, contrast and scale. It has a 
superior behavior compared to hierarchies constructed by considering either the 
superficial or the deep image structure alone. 




p 

k 



P 



k-1 



Fig. 1. Flowchart of the constrained dynamics of contours in scale space for color 
images. 



2.2 The Salient Measure Module 

The cooperation of watershed analysis and scale-space’ main motivation is the 
duality of the catchment basins of the watershed with their respective minima 
in the gradient image and the simplification process which occurs during the 
creation of a scale-space. The used scheme relies on the concept of the dynamics 
of contours in scale-space, which incorporates a segment linking that has been 
advocated by a study of the topological changes of the critical point config- 
uration [10]. The entire process to retrieve a salient measure for the gradient 
watershed requires three steps: (i) scale generation, (ii) linking, and (iii) contour 
valuation by downward projection. 

On the localization scale So, which in our case is the original image, a gradient 
watershed is performed. This produces the minima that are used for the linking 
scheme. In our approach the linking entities are regions and not pixels. 



Dyuamics of Coutours iu Scale Space. The linking process (parent-child 
relationship) for successive levels is applied using the proximity criterion [10]. 
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Once the parent-child relations are made, a linking list L(p, q) for each region 
couple {p, q) at the localization scale is created, the next step is to valuate the 
gradient watersheds at the localization scale So using the notion of dynamics of 
contours in scale-space (DCSS). The DCSS for an adjacent region couple {p, q) 
is defined as the integration of all the valuations of the dynamics of contours 
during the evolution in scale-space. Formally: 

Sa-l 

DCSS{p,q)= (1) 

l=So 

where p and q denote adjacent regions at the localization scale So, Sa is the 
annihilation scale; the difference Sa~So is the scale space life time of the contour, 
and L{p,q) the linking list. 

Gaussian Scale-Space. The Gaussian scale generator convolves the original 
image with Gaussian kernels of increasing width. To ensure scale invariance, 
the sampling of the generate scale-space follows a linear and dimensionless scale 
parameter <5r, which is related to a by: 

ajv = (2) 

Anisotropic Diffusion of Perona and Malik. Inherent problems of linear 
scale-space methods led to the investigation of their non-linear counterparts. 
The idea was to deal with problems such as the dislocation of edges, the similar 
treatment of information and noise, etc. In non-linear methods, extra informa- 
tion is added to guide the diffusion process. In [9], Perona and Malik propose 
an anisotropic diffusion filtering for scalar images that avoids blurring and local- 
ization problems of the linear diffusion filtering. They apply an inhomogenous 
process that reduces the amount of diffusion at those locations, which have a 
larger likelihood to be edges. This likelihood is measured by a decreasing func- 
tion of the squared gradient and is denoted the diffusivity or diffusion tensor. In 
this paper we adopted for a function that favors wide regions over smaller ones: 

9(i’(")t) = -nW ® 

where the constant K> 0 is a contrast parameter that separates forward (low 
contrast) from backward (high contrast) diffusion. The value of K is problem 
dependent and needs to be determined experimentally. The diffusion scheme 
proposed in [9] is ill-posed and subsequently we estimated the diffusion tensor 
on the gradient of a Gaussian smoothed image as was proposed by Catte et 
al. in [3]. In our experiments this regularization technique was applied using a 
Gaussian kernel of 0.5. In the following sections the non-linear scheme is referred 
to as the Perona-Malik-Gatte scheme (PMG). 

The numerical scheme, given in [9], is replaced with the AOS scheme [14] 
to increase the computational performance. However, even tough this removes 
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the stability restriction on the time step, one now ought to restricted the time 
step to avoid visible errors in approximation of the diffused image. During our 
experiments the time step never exceeds 10. 

For a better comparative analysis, only those scales that are equivalent to 
the ones produced in the Gaussian scale space (Eq. 2) are considered. Using the 
relationship between the size of the Gaussian kernel and the scale-parameter t 
of the diffusion equation [13], we construct a selection scheme to determine the 
scales in which the dynamics of contours should be estimated. The time step A 
is adapted at each scale N as follows: 

A = = (4) 

The extension of the scheme for color images is obtained by diffusing each 
color channel separately using a common diffusion tensor, which is estimated 
using the Euclidean distance between the color vectors of neighboring pixels. 

2.3 The Stopping Criterion Module 

This module identifies the hierarchical levels. When the contour valuation of 
adjacent region couples has terminated, a ranking of the values providing the 
priority of merging is applied. The hierarchical segmentation algorithm will be 
completed after the application of the merging stopping criterion phase which 
retrieves the different hierarchical levels HLk- For this purpose, a statistical 
decision is employed through a hypothesis test, leading to the creation of a new 
hierarchical level in the case that the homogeneity constraint imposed in the 
regions is violated during the region merging process [11]. 

3 Comparative Study 

The proposed segmentation schemes are compared on the following aspects: (i) 
Evolution of the dynamics of contours in scale-space, (ii) size and information 
reduction between successive levels of the hierarchical stack and (iii) computation 
time. Both schemes use the same discretization of scale-space stack (2, 4). This 
way we ensure that the comparison is performed on equivalent scales, however 
this also implies that we do not use an optimal discretization of the PMG based 
scale-space stack. Similar remarks can be issued concerning the localization scale, 
for which in case of the PMG a much coarser scale can be chosen. Figures 2 and 
3 give an inside to the scale generator and the scheme used to link the local 
minima over the scales. In figure 2 we see an cross-section of the scale-space 
stack, which is - as expected - decaying faster for the linear scale generator. 
Figure 3 shows some mosaic images, which can be obtained after the linking of 
the local minima. One notices that the segmentation quality of the mosaic images 
for the non-linear scale generator is relatively high. The linear scale generator, 
on the other hand, rapidly loses several important details. 

The dynamics of contours in scale-space combine scale and contrast measure. 
The scale-space lifetime of all significant details is longer for the PMG scheme 
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Fig. 2. Scale-space stack in RGB color-space. Top: Gaussian scale generator, from left 
to right: N = 0,4,7,10. Bottom: PMC diffusion with k = 4.5. N = 0,4,7,10. 



than for the Gaussian scheme. This is demonstrated in Figure 4, which shows 
the number of regions retained at each scale. The reduction of information is 
significantly slower at the finer scales for the PMC scheme, subsequently the 
majority of regions have a larger scale-space lifetime. Figure 5 displays the evo- 
lution of the dynamic for two contours in scale. The first contour is the fairly 
weak contour separating two green regions, which are located is the grass in 
front of the house (Figure 6). The second is a very salient one, which separates 
the house and the sky (Figure 6). For the Gaussian scheme both contours have 
a fast exponential decrease. In the PMC scheme the dynamic of salient contour 
is more or less constant with an enhancement at coarser scales. For the weak 
contour the dynamic decreases stepwise with an increasing step until it disap- 
pears. Hence, the dynamics of contours in scale-space for the PMC scheme can 
be discriminated better, which results in a more robust hierarchy among the 
regions. 

The homogeneity test, which does not employ the scale-space stack, uses 
the uniformity information retrieved from the localization scale only. It aims to 
extract the hierarchical levels, which contain regions of similar saliency. The hier- 
archical stack and the information reduction between the successive hierarchical 
levels (number of regions) are demonstrated in Figure 4 and Figure 6, where a 
selection of hierarchical levels is shown. On first sight the results of both schemes 
seem very similar. Both schemes result in a hierarchical stack that has to many 
oversegmented levels and does not really contain an optimal level. However, the 
hierarchical stack of the PMC scheme is qualitatively better since it has a better 
treatment of almost similar regions and small but salient details. Some examples 
can be found in Figure 6 at Pfc = 10 and Pk = 12. 
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Fig. 3. Mosaic images obtained after linking. Top: Gaussian scale generator, from left 
to right: N = 2, 5, 10, 13 with 1988, 232, 25 and 12 regions. Bottom: PMC diffusion 
with k = 4.5. = 4, 8, 12, 16 with 2887, 1196, 609 and 273 regions. 



The computation time of the PMC scheme would be significantly longer 
than the Gaussian scheme if one where to use the explicit numerical scheme. 
However, the usage of the AOS numerical scheme decreases the computation 
time significantly. The non-linear scheme is still slower due to diffusion-quality 
restriction on the time step and due to a higher amount of regions at the finer 
scales. The latter increases the computation time for the linking of the local 
minima and calculation of the dynamics of contours in scale-space but can be 
resolved by selecting an appropriate localization scale ^ 

4 Conclusions 

The hierarchical stack of both schemes is more or less similar: The higher hier- 
archical levels for the PMC scheme are slightly better and the lower hierarchical 
levels are in both schemes severely oversegmented. At turning point between 
over- and undersegmented in the hierarchical stack, the PMC results in a bet- 
ter hierarchical retrieval. The dynamics of contours retrieved using the PMC 
scheme show a quantitative superiority, which leads us to believe that the used 
homogeneity test causes the similarity of the hierarchical stacks. This is a rea- 
sonable assumption since the used homogeneity test only conveys the uniformity 
information of the regions at the localization scale. Therefore, we suggest that 
in our future work we attempt to include uniformity information from the whole 
scale-space stack. Furthermore, an analytical investigation of the color gradient 

^ Color versions of Figures 2, 3 and 6 are available at 
http : //www. etro . vub. ac .be/~iuvanham/pubs/ScaleSpace01 .html. 
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Fig. 4. Left: Number of regions retained at each scale. Right: Number of regions at 
each hierarchical level. 




Fig. 5. Evolution of the dynamic of contour in scale-space. Left: weak contour. Right: 
very salient contour. 




Fig. 6. Hierarchical Levels in RGB color-space. Top: Gaussian scale generator, from 
top to bottom: Pk = 0,10,11,12. Bottom: PMG diffusion with k = 4.5. Pk = 0,10,11,12. 
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watersheds retrieved in the PMC scale-space is needed to optimize the parent- 
child linking and the manner in which the dynamics of contours over the scales 
are combined. The selection of the localization scale and the measurement scales 
in non-linear scale-space also requires further study. 
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Abstract. In medical microscopy, image analysis offers to pathologist 
a modern tool, which can be applied to several problems in cancerology: 
quantification of DNA content, quantification of immunostaining, nuclear 
mitosis counting, characterization of tumor tissue architecture. However, 
these problems need an accurate and automatic segmentation. In most 
cases, the segmentation is concerned with the extraction of cell nuclei 
or cell clusters. In this paper, we address the problem of the fully auto- 
matic segmentation of grey level intensity or color images from medical 
microscopy. An automatic segmentation method combining fuzzy clus- 
tering and multiple active contour models is presented. Automatic and 
fast initialization algorithm based on fuzzy clustering and morphological 
tools are used to robustly identify and classify all possible seed regions 
in the color image. These seeds are propagated outward simultaneously 
to refine contours of all objects. A fast level set formulation is used to 
model the multiple contour evolution. Our method is illustrated through 
two representative problems in cytology and histology. 

Keywords: Segmentation, active contour models, level set method, fuzzy 
clustering, medical microscopy. 



1 Introduction 

Image analysis offers a modern tool to pathologist, which can be applied to 
several problems in cancerology : quantification of DNA content, quantification 
of immunostaining, nuclear mitosis counting, characterization of tumor tissue 
architecture, etc... However, its introduction in clinical daily practice implies 
complete automation and standardization of procedures, together with the eval- 
uation of the clinical interest of measured parameters. One of the bringing out 
steps is the segmentation process, which has to provide the interesting objects 
to be measured. 

Segmenting medical images of soft tissues to form regions related to mean- 
ingful biological structures (such as cells, nuclei or organs) is a difficult problem, 
due to the wide variaty of structures characteristics. Many strategies can be 
used; their performances depend largely on images to be processed and on a 
priori knowledges relative to the object features. Efforts have been made to- 
wards the unification of the contour and region based approaches, and level 
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set theories have been used in the formulation of the unification of these ap- 
proaches To the best of our knowledge, the only work applying level 

set approach to medical microscopy images is reported by Sarti in 0. In this 
work a partial differential equation based analysis is used as a methodology for 
computer-aided cytology. All the approaches using the level set approach for 
active contours can deal with gradient or regions information and can handle 
topological changes automatically. However, when an automatic segmentation 
or a quantitative segmentation such as studies in medical microscopy 0 is 
needed, robust and automatic specification of initial curves is required. 

In this paper, we present a method for automatic segmentation of biomedical 
microscopic sections as a combination of fast level set approach and fuzzy cluster- 
ing based on global color information. An initial automatic detection algorithm 
based on fuzzy clustering is used to robustly identify and classify all possible 
seed regions in the image. These seeds are propagated outward simultaneously 
to localize the final contours of all objects in the image. 

The originality of the method is to classify markers obtained by morphological 
operators. The technique is fast, because the markers represent only 1 to 5% 
of the total number of pixels in the image. These markers resulting from this 
classification, are distributed symmetrically inside the objects of interest. They 
provide a good automatic initialization of the contours which allows to active 
contours and level set methods to operate in good conditions. 

This paper is organized as follows. In section 2, the level set algorithm is re- 
viewed. Section 3 presents a fast level set algorithm called the Group Marching 
Algorithm ^0! ; it describes how we extend the later to deal with multiple active 
contour evolution. In section 4, we consider the problem of automatic initial- 
ization of level set, and propose automatic fuzzy clustering combined with local 
morphology tools as an automatic initialization algorithm. As a conclusion, sec- 
tion 5 illustrates the robustness of the proposed method with two representative 
problems of color quantitative segmentation in medical image microscopy. 



2 The Original Level Set Approach 



Since its introduction, the level set approach has been successfully applied to a 
wide collection of problems that arise in computer vision and image processing. 
Let us describe the original level set idea of Osher and Sethian HH for tracking 
the evolution of an initial front To as it propagates in a direction normal to itself 
with a speed function F. The main idea is to match the one-parameter family 
of fronts where fy is the position of the front at time t, with a one- 

parameter family of moving surfaces in such a way that the zero level set of the 
surface always yields the moving front. To determine the front propagation, it is 
necessary to solve a partial differential equation for the motion of the evolving 
surface. Assume that the so-called level set function u : M" x IR"^ ^ IR is such 
that at time t > 0 the zero level set u{x,t) is the front fy. The derivations 
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described in m yield the time-dependant level set equation: 

^ = F\Vu\ 

u {x,t = 0) = ±d{x, Fq) ' 

where d(a;, / q) is the distance from x to the curve Iq. The distance is positive 
if x is inside To and negative if x is outside. If the non-regularized model given 
by equation (P) is considered, this leads to an interesting and fast model being 
able to take into account the simultaneous evolution of several contours. In 
this model, the speed function F is either always positive or always negative. 
We can introduce a new variable (the arrival time function) T(x) defined by 
u{x,T (x)) = 0. In other words, T (x) is the time when u{x,t) = 0. If ^ yf 0, T 
satisfy the stationary eikonal equation 

^ d{x)—0 — U 

This equation states that the gradient of the arrival time function is inversely 
proportional to the velocity of the contour at any given point. The advantage of 
this formulation is that it can be numerically solved by fast techniques. Sethian 
m combined heap sort algorithm with variant of Dijkstra algorithm to solve 
equation ( 0 , this method is known as the fast marching method {FMM). The 
heap sort algorithm is used to update T at any specified pixel in an increasing 
order. If N is the number of image pixels, the complexity goes as 0(Nlog(N)). 
Lately, an alternative sweeping strategy was suggested and used by Kim jlUj to 
derive fast algorithm known as Group Marching Methods (GMM); its cost is 
0(N). The latter is used in this paper. 

3 The Group Marching Algorithm 

Now let us derive a discrete version of the Eikonal equation 0- The easiest 
way to obtain such discretization is to replace the gradient by the first-order 
approximation H2| : 

{max {D-;T, -Dt;T, Q)f+ {max -D+yT, 0)f = ^ (3) 

where the standard finite differences are given by: = Tij—Ti-ij and = 

Ti+ij — Tij, where is the value of T for each pixel Expressions of 
and D~-^ in the other direction are similar. 

Consider a neighborhood T of the front Ft, in the current stage of GMM, a 
group of points G is selected from T. Some pixels of T are included in G and 
are already labeled ’’completed”. For the other pixels of T the equation o is 
solved. 

The evolution of the set of active pixels is done by choosing, at the initial time, 
a subset G of T which corresponds to all the points that have to be processed. 
The formal definition of this principle is given by: 
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^r^min — Tnijl 






( 4 ) 



St = -^Fr^min, Tr,min = min : {i,j) & F} and select G as follows: 

G = {p = {i,j) & F,Tp < Tr,min + St}. Proceeding as in the Group 
Marching Method goes as follows: 



Initialization 

• Processed pixels: All pixels under markers; assign a distance transform 
value of zero to them T(i,j)=0 and label them idT(i,j)=2 

• Active pixels: Pixels at the outside boundary of the markers; their dis- 
tance transform is known T(i,j)=l/F(i,j) 

* label them as idT(i,j)=l 

* save those point indices to the interface indicator array F(i,j) set 
TM to be the minimum of T on those points 

• Unprocessed pixels: Pixels away from the markers T (i,j) = oo; label 
them as idT(i,j) = 0, 

• 5r= 

Marching Forward: 

(Ml) Set TM = TM + St 

(M2) For each (i,j) m F (i,j), in the reverse order, if T{i,j) < TM, update 
the solution at neighboring points {l,m) where idT (l,m) < 1; 

(M3) For each (i,j) in F, in the forward order, if T{i,j) < TM, 

(a) update the solution at neighboring points {I, m) where idT (I, m) < 1; 

(b) if idT (l,m) = 0 at a neighboring point (l,m), set idT{i,j) = 1 and 
save (l,m) into T; 

(c) remove the index (i,j)) from F; set idT{i,j) = 0; 

(M4) if F go to (Ml ); 



The GMM is in fact an iterative update procedure, converging in two iter- 
ations. Rouy and Tourin m have chosen all the grid points as one group and 
carried out iterations up to convergence. GMM can be viewed as an intermediate 
algorithm between FMM {St 0) and the purely iterative algorithm of Rouy 
and Tourin (St — > oo). 

This algorithm can be easily extended to deal with evolution and labeling 
of multiple curves. Let us assume we have G seed regions Gi, to deal with the 
evolution. Any of the independent contours possibly propagates with different 
speeds, we label all seeds with G labels according to the results of fuzzy classifi- 
cation, and then we propagate these labels while computing Gi, by solving the 
equation: 



- ^T) 

T/G. = 0 



( 5 ) 



For each pixel, two properties are calculated: the arrival time and the re- 
gion label that reached that pixel first. All curves are thus allowed to evolve 
simultaneously and no limiting evolution time is necessary. 
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The implementation of this algorithm is at the root of a wide range of image 
processing applications to do various image analysis tasks that one typically 
encounters in the study of medical microscopic images. 



4 Automatic Initialization 

An essential step of the whole framework consists in estimating features associ- 
ated with different labels and in determining the initial seed regions. 

Our method consists of two steps: 

1) A detection of a set of germs, located in a symmetrical way inside all the 
interesting objects in the image. Mathematical morphology operators are mainly 
used to extract these germs. 

2) All or a part of these germs is gathered in classes of germs according to 
their color, by using a fuzzy classification. These seeds are classified according 
to their color and characterized by region information : the mean and variance 
of each class i. The latter can be supervised or unsupervised using the available 
a priori information on the images considered. 



4.1 Feature Extraction 

A 2D color image / is a function where each pixel {xi,X 2 ), by three grey level 
values in the RGB color space. The gradient amplitude is obtained by the contour 
information I defined by : |VJ| = -\/A+ -I- A_. A+,A_ are the largest, (resp. 
smallest) eigenvalues of the quadratic form associated to /. The local minima 
or the h-minima of this contrast image give a set of seed regions placed nearly 
symmetrically with respect to the object boundaries. The h-minima of the image 
can be formulated by : 

h^^n (Uo) = {p / (Uo (P) - 7^"“^ {Uo, Uo + h) (P)) < o} 

where ({7g, C/g -|- h) denotes the morphological reconstruction by erosion 
of the Uo+h image with Uq. 



4.2 Fuzzy Classification of Seed Regions 

For classification, a modified fuzzy c-mean algorithm |1 4j is applied to classify all 
seed pixels in a given image into C classes by minimizing the following objective 
function : 

C N C 

J = ^ ^ Kj)"* {X],Ci) - a ^ Pi log(pi) 

2=1 J = 1 2=1 



c 

where Uij is the membership value at pixel j in the class i such that ^ Uij = 1 

2=1 



N 

V j € [0 , ^2 = X! interpreted as ”the probability” of all the pixels 
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j to belong to the class i. Ci is the centroid of class i, N is the total number of 
pixels in image, (P (xj,Ci) is the standard Euclidian distance and the fuzziness 
index m is a weighting coefficient on each fuzzy membership, m=2 is an usual 
value. 

In the algorithm, the number of classes C can be known or automatically 
determined by choosing initially a high value of C and eliminating the class i 
with the smallest probability pi. 



5 Localization 

In order to take into account the information about regions and contours ob- 
tained in the classification step , we consider an adaptive speed function F. This 
function is defined in each point by the following equation: 

if X) + \^ cl(x,y)\‘^\ 

F^{I){x,y) = l-e V'==i / (6) 

where /fc is the channel of color image is the mean of classes i on 
channel k and Vic is the color gradient amplitude. 

6 Biomedical Applications 

6.1 Color Cytology 

For this first biomedical application, images from serous cytology are considered. 
The images are from a database of digitized cells images, collected from pleural 
and peritoneal effusions with different pathologies. In this class of images, both 
cytoplasm and nuclei have to be segmented. Once segmented, the cells can be 
classified among cellular types (ranging from normal to abnormal). Figure 1(b) 
gives the set of minima extracted from the amplitude of the gradient. From these 
ones, markers are obtained for each class : nuclei, cytoplasm and background 
(figure 1 (c), (d), (e), respectively). Figure 1(f) presents the final result. 



6.2 Color Histology 

In the second example, acquisitions were performed on sections of immunohisto- 
chemically stained tissues (figure 2 (a)). The markers involve a brown coloration 
for positive nuclear locations and a blue coloration for unmarked nuclei (negative 
locations). Images of this class are more complex than in the previous case. One 
could be interested by many categories of objects : the clusters of tumoral cells 
(called lobules in carcinoma), the marked and unmarked tumor nuclei presenting 
specific characteristics inside the clusters. The goal of this analysis is to evalu- 
ate the immunostaining ratio defined as the positive nuclear area to the whole 
nuclear area within the the lobules limits. 
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Fig. 1. (a) A serous cytology color image (x20) . (b) Gradient amplitude min- 
ima. (c) Nuclei markers, (d) Cytoplasm markers, (e) Background markers, (f) 
Final segmentation. 



Segmentation of Lobules. The tumoral lobules are made of clusters which can 
be characterized by a small inter-cellular distance and whose nuclei have a greater 
size than the other cell categories (lymphocytes or stroma cells for example) . The 
processing goes according to the the following: a)Image simplification is used to 
remove lymphocytes and to make the clustering of other cells easier. This step 
uses morphological closing performed on each color plane (z ranging from 1 to 
3): 7 b (a) = £b ° Sb (li), where Sb and eb are the dilation and erosion of the 
plane of the color image / by a flat structuring element B. 
b)The fuzzy clustering algorithm provides reliable markers for the two differ- 
ent classes of pixels to be used in the localization. The result is a binary mask 
h displaying the lobules (flgure 2 (b)). 

Detection of the Nuclei Inside the Lobules. This process is twofold: 

a) Extraction of the nuclei by residual analysis on the luminance component 
{I L ) provides a monochromatic image I r whose positive and negative values 
form a binary image of nuclei la- 

b) An inverted image of distance Id is computed from la and the watershed 
transformation is applied to split nuclei initially merged. The distance and 
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(a) (b) 

Fig. 2. (a) Original histological breast cancer image (x33). (b) Segmentation 
result : binary mask of lobules. 



the watershed transformations are computed by setting F=1 and F = |V/d| 
respectively, in equation ifHjl . 

The process is limited to the lobule area, by the means of a logical intersection 
between the image of lobules and that of all nuclei (figure 3 b) . 



Immunostaining Characterization. A simple binary thresholding, which 
represents the degre of membership to the class of marked nuclei, allows to 
detect the positive pixels (brown pixels). To extract the marked nuclei, the seg- 
mented objects are reconstructed from the positive pixels in order to assess the 
total area of positive profiles. 



7 Conclusion 

A fast statistical level set method for color image segmentation was presented. 
This method is based on the integration of two attractive techniques : the fuzzy 
clustering and the level set active contours. They can both take into account local 
information, such as the gradient modulus, and statistical information, such as 
the mean color levels in an object. According to their properties, the initialization 
and localization can be easily extended from 2D images to 3D images, provided 
by a confocal microscope. 
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Fig. 3. Segmentation of nuclei inside the lobules, (a) Residual analysis of the 
luminance, (b) Intersection image, (c) Marked nuclei inside the lobules. 
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Abstract. In this paper, we embed the minimization scheme of an au- 
tomatic 3D non-rigid registration method in a multi-scale framework. 
The initial model formulation was expressed as a robust multiresolution 
and multigrid minimization scheme. At the finest level of the multires- 
olution pyramid, we introduce a focusing strategy from coarse-to-fine 
scales which leads to an improvement of the accuracy in the registration 
process. A focusing strategy has been tested for a linear and a non-linear 
scale-space. Results on 3D Ultrasound images are discussed. 



1 Introduction 

Non-rigid registration can be considered as a motion estimation problem which 
can be solved by minimizing an objective function. This function is the energy 
which usually consists of two terms. The first term represents the interaction 
between the unknown variables and the data while the second one explores 
some kind of prior information. In the context of dense motion field estimation, 
Memin and Perez |B| proposed a motion estimator which makes use of the opti- 
cal flow constraint along with an associated smoothness regularizing prior. Both 
terms have been constructed with an outlier rejection mechanism, originated 
from robust statistics. For the minimization of their functional they use a mul- 
tiresolution and multigrid scheme. The multiresolution part is dedicated to grasp 
large displacements while the multigrid approach is invoked for accelerating the 
estimation. Extension of this work to treat 3D data has been done by Hellier et 

al 0 - ^ 

In this paper, we embed the above mentioned minimization scheme in a multi- 
scale framework aiming to improve the estimates by making them less sensitive 
to noise of acquisition. In the same spirit, Weber and Malik [7] propose a model 
for multi-scale motion estimation. They convolve an image sequence with a set 
of linear, separable spatiotemporal filter kernels and apply a robust version of 
the total least squares on the filtered responses in a two step method. Niessen 
et al. HU report a reconciliation of optical flow and scale-space theory. They 
compute both zeroth and first order optic flow at multiple spatial and temporal 
scales and they apply a scale selection criterion which attributes in each pixel 
the optic flow at the chosen scale. Alvarez et al. [Q present an interpretation of 
a classic optical flow method by Nagel and Enkelmann HDI as a tensor-driven 
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anisotropic diffusion approach. They avoid convergence to irrelevant local min- 
ima by embedding their method in a linear scale-space framework. 

Our work was motivated by the application of tissue deformation tracking 
during surgery using 3D ultrasound. The problem of registration (motion estima- 
tion) in ultrasound images has been treated by different researchers. Morey and 
Von Ramm jO] investigate the implementation of a correlation search scheme to 
estimate the 3D motion vectors and demonstrate the advantages over 2D corre- 
lation search using the Sum Absolute Difference (SAD) as a similarity measure. 
Strintzis and Kokkinidis m introduce a maximum likelihood block matching 
technique which corresponds to an accurate statistical description of ultrasound 
images. In m an adaptive mesh has been proposed for non-rigid tissue motion 
estimation from ultrasound image sequences. A deformable blocking matching 
algorithm has been developed which takes into consideration both similarity 
measures and strain energy caused by mesh deformation. In |I2|> Pennec et al. 
disseminate results regarding 3D Ultrasound registration using the demon’s al- 
gorithm and a straightforward minimization of the sum of square of intensity 
differences criterion. 

Non-rigid registration of 3D Ultrasound images poses a significant challenge 
due to the following shortcomings: (i) Low SNR of ultrasound images which are 
characterized by Rayleigh-governed speckle noise and corrupted by Gaussian- 
distributed electronic noise; (ii) motion ambiguities which arise when there is 
insufficient representation of spatial information. This holds in regions of image 
saturation or specular reflection and in homogeneous regions of weak acous- 
tic scatterers; (iii) Speckle decorrelation. Since speckle patterns result from the 
constructive and destructive interference of ultrasonic echoes from numerous 
subresolvable elements, nonuniform movement of these scatterers in the tissue 
volume can cause temporal decorrelation of the speckle patterns. 

The algorithm which is presented in this paper is designed to overcome the 
above shortcomings and lead to an accurate registration. 

The paper is organized as follows. In Section |2| we present in detail the mul- 
tiresolution and multigrid optimization scheme. Section describes the multi- 
scale framework that the optimization scheme is embedded. Section 0 is dedi- 
cated to experimental results and conclusions are drawn in Section 0 

2 Primary Registration Model 

2.1 Formulation of the Registration Problem 

In this work, the registration problem is considered as a motion estimation prob- 
lem. The optical flow hypothesis, introduced by Horn et Schunck 0 , leads then 
to the minimization of the following cost function: 

U{w;f) =^[Vf{s,t) -Ws + ft{s,t)f + a ^ (1) 

sSS <s,r>eC 

where s is a voxel of the volume, t is the temporal index of the volumes, / is 
the luminance function, w is the expected 3D displacement field, S is the voxel 
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lattice, C is the set of neighboring pairs and a controls the balance between 
the two energy terms. The first term is the first order Taylor-expansion of the 
luminance conservation equation and represents the interaction between the field 
and the data, whereas the second term expresses the smoothness constraint. 
Shortcomings of this formulation are well-known: 

(a) The optical flow constraint (OFC) is not valid in case of large displacements 
because of the linearization. 

(b) The OFC might not be valid everywhere, because of the noise, the intensity 
non-uniformity, and occlusions. 

(c) The “real” field probably contains discontinuities that might not be pre- 
served. 

To cope with (b) and (c) limitations, the quadratic cost has been replaced by 
robust functions. To face problem (a), a multiresolution and multigrid strategy 
has been designed. 



2.2 Robust Estimators 

Cost function © does not make any difference between relevant data and in- 
consistent data, and it is sensitive to noise. Therefore, robust M-estimators have 
been introduced in the formulation j2j. An M-estimator is a function p that is 
increasing on K+, such that (i) ^(u) = p{\/u) is strictly concave on K+ and 
(ii) lim 2 ,_,oo p'(a;) < oo. The main benefit of robust M-estimators is the semi- 
quadratic formulation that can be deduced from (i) : 

G C^([0,M],R) : Vu,p(u) = min (zv? + (2) 

2G[0,M] 

Two robust estimators have therefore been introduced: the first one on the 
data term (pi) and the second one on the regularization term {p^)- According to 

(0, the minimization of the cost function O is equivalent to the minimization 

★ 

of the augmented function, noted U: 



U{w;f) = '^Ss (V/(s,t) • w, + f t{s, t) f + tpiiSs) 
ses 

+a ^ /3sr{\\Ws-Wr\\f +tj}2{f3sr), 

<s,r>ec 



( 3 ) 



where 6s and Psr are auxiliary variables acting as “weights” . This cost function 
has the advantage to be quadratic with respect to w. Furthermore, when the 
adequation of a data with the model is not correct, its contribution gets lower as 
the associated weight 6s decreases {Sg = t) • Ws + /t]^), and function 

cj)' decreases), making this formulation more robust. 
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2.3 Multiresolution and Multigrid Minimization 

In order to cope with large displacements, a classical incremental multiresolu- 
tion procedure has been developed. A pyramid of volumes {/^} is constructed 
by successive Gaussian smoothing and subsampling. At the coarsest level, the 
linearization of the conservation equation can be hopefully used. For the next 
resolution levels, only an increment is estimated to refine estimate th^, 
obtained from the previous level (Equation 0 . 



U'^idw’"; f^, w^) = ^ (v/'^(s -k w';,t2)dw’^^ + /'"(s + t2) - /'"(s, G)) 



sGS*’ 



+^i{S^)+a (iK^s + dtn^)||j (4) 

<s,r>GC'= 



Furthermore, at each level of resolution, a multigrid minimization based on 
successive partitions of the initial volume is achieved (see Fig. 0. For each cube 
of a given grid level i (partition of cubes), a 12-parametric increment field is esti- 
mated. The result over the grid level is a rough estimate of the desired solution, 
and it is used to initialize the next grid level. This hierarchical minimization 
strategy improves the quality and the convergence rate. 

The partition at the coarsest grid level is initialized with a binary segmen- 
tation mask of the structure of interest (template). The octree partition which 
is thus defined is anatomically relevant. When we change grid level, each cube 
is adaptively divided. The criterion of subdivision may be either the measure of 
the way that model fits the data, or a prior knowledge such as the presence of 
an important anatomical structure where estimation must be accurate. Conse- 
quently, we can distinguish between the regions of interest where the estimation 
must be precise and the other regions where computation efforts are useless. 



3 Embedded Multi-scale Framework 

The multigrid scheme which has already been described is bound to a good 
initialization of the flow. To improve the quality of the initial estimates we 
propose to incorporate the scale of image measurements by exploring the scale- 
space of the data-derived information. Specifically, since we deal with the optical 
flow constraint we experiment with two scale-spaces which are characterized by 
the luminance conserving principle. These are the linear scale-space and the 
one which is constructed by the regularized version ^ of Perona-Malik (P&M) 
algorithm m Let be the luminance of a voxel at the finest spatial resolution 
which has been diffused at the scale quantization level r. Then, a linear scale- 
space is denoted as: 

fr = fo* ( 5 ) 

where * denotes convolution, fo is the original image and is the Gaussian 
kernel for standard deviation cr. 
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Fig. 1. Example of multiresolution/multigrid minimization. For each resolution 
level (on the left), a multigrid strategy (on the right) is performed. For clarity 
reasons, this is a 2D illustration of our 2>D algorithm. 



If no scale is preferred, the natural way to travel through a linear multi-scale 
can be realized via a sampling which should follow a linear and dimensionless 
scale parameter 5\ which is related to cr by : 

(T, = (6) 

where r denotes the scale quantization levels. 

The regularized P&M scale-space in its discretized form is denoted as: 

u = U-i + xY^c,{G„*A,f) (7) 

where i S {N,S,E,W,F,B} and N,S,E,W,F,B denote Northern, Southern, 
Eastern, Western, Forward and Backward neighbor respectively. 

c,=g\\G^* Af II (8) 

Ci is a decreasing function of the image gradient that has been determined at 
a scale cr to compensate for noise and to assure well-posedness of the diffusion 
equation. Z\i/ = /i — /* where /* denotes the central pixel in a 3-dimensional 
mask with 6-neighbor connectivity. 

5|| A/||=e-(^') (9) 

g II Af 11= ^ (10) 

where /c is a contrast parameter and can be interpreted as a threshold, which 
determines whether a gradient is significant or not. 
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For non-linear diffusion schemes there is no global scale parameter because 
they adapt the diffusion locally. However, we may synchronize their scale pa- 
rameter with the one of linear diffusion. This holds due to the fact that the 
scalar diffusivity Ci in Equation |H| is constructed such that || c ||< 1. Therefore, 
an upper bound is derived for the nonlinear schemes which permits us to re- 
call the relation between the evolution parameter and the standard deviation 
of the Gaussian = (l/2)cr^ for the creation of the regularized P&M scale 
quantization space. 

The construction of any of the above scale-spaces leads to a stack of volumes 
{/^} which is the source of the data measurements for every successive quanti- 
zation scale during a coarse-to-fine parameter estimation. This can be explained 
by Equation 



U°{dw°-J°,w°) = ^ ( 5 ° (V/°(s-km°,t2) ■ dw° + f°{s + w°,t2) - /°(s,ti))^ 

sGS° 

+MS°)+a (IK^s -kdw°)||)% V'2(/3^), (11) 

<s,r>GC“ 



/° denotes the data measurement at the finest pyramid resolution and the r 
scale quantization level. 

Our goal is the estimation of parameter which is refined at each quantiza- 
tion scale by only an increment dWg. Minimization remains in the same multigrid 
fashion. 

4 Experimental Results 

We have already mentioned in Section 0 that our efforts were motivated by 
the application of tissue deformation tracking which can result in brain shift 
correction. In view of this, we have conducted a number of experiments using 
an original 3D Ultrasound image (256x256x128) of the brain of an 8 months 
old baby and its deformed counterpart. The acquired original volume is the 
result of an examination through the fontanella. In the ideal case, the accuracy 
of our algorithm in registering volumes should be tested in a situation that the 
actual motion should be known. Due to the difficulty to produce known non-rigid 
motion fields in biological tissues we have chosen to simulate this phenomenon. 
We have created an artificially deformed volume by using a Thin Plate Spine 
deformation |3j. Although this approach produces a global smooth deformation, 
we were very careful in the distribution of the point landmarks over the whole 
volume to cope with local deformations. The produced deformed volume and 
velocity field can be seen in Figure EJb) and Figure El(c), respectively. 

In our experimental work we strived towards an overall comparison between 
the primary non-rigid registration model of Section0and the model with an em- 
bedded scale-space framework of Section 0 Our evaluation is both qualitative 
and quantitative. As a qualitative measure we have chosen to use the differ- 
ence image between the original volume and the reconstructed one. All of the 
registration models produced difference images without significant differences. 
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implying a visually correct registration (Figure 0(b)). For the sake of compar- 
ison we provide you the difference image between the original volume and the 
deformed one in Figure El) a). The difference image in Figure El)b) has come out 
after the application of the algorithm which uses the embedded regularized P&M 
scale-space. 

For a quantitative evaluation we have considered the following measures : 
(i)Mean square error (MSE); (ii) the average angular error between correct 
and estimated vl velocity : ip = arccos(u)( • u^) along with (iii) its standard 
deviation. Table Q demonstrates the improvement in velocity estimation which 



Table 1. Quantitative Comparison Measures. 





MSE 


Mean angular error 


Std deviation 


Without multi-scale framework 
Embedded Linear scale-space 
Embedded Regularized P&M scale-space 


10.2772 

9.73472 

9.6945 


14.112656° 

13.878700° 

13.791579° 


24.254787° 

23.987515° 

23.959972° 



has been achieved for all three above measures in the case of the embedded 
scale-space framework for both the linear and the regularized P&M case. The 
latter one has a slightly better behavior than the linear one. 

Our basic argumentation for the advantageous use of a multi-scale framework 
was that it can lead to improvement in quality of the initial estimates at the 
multigrid optimization scheme which subsequently will improve the quality of 
the final estimates. A verification of this is presented in Figure OI)c) which shows 
in terms of MSE the improvement that occurs during successive multigrid levels 
at the finest spatial resolution for all the three examined cases. We may observe 
that in the case of the absence of a multi-scale framework we get an initial 
estimate with an MSE equals to 15.1756 while in the case of linear scale-space 
we get an initial estimate with an MSE equals to 11.9146 and in the case of 
regularized P&M scale-space we get an initial estimate with an MSE equals to 
12.1082. The higher quality of the initial estimates was preserved till the final 
stage at the multigrid optimization scheme. 

5 Conclusions 

In this paper, we propose a methodology which embeds a multi-scale frame- 
work in a multiresolution and multigrid optimization scheme that can lead to 
a successful non-rigid registration of 3D Ultrasound images. It grasps its power 
from three fundamental features which operate as the remedy in the basic short- 
comings of ultrasound images. Its multigrid nature responds to motion ambigu- 
ities in the case of insufficient representation of spatial information, its estimate 
smoothness functional term can fight the speckle decorrelation which character- 
izes ultrasound while low SNR can be less disastrous for the estimates in the 
case of embedding a multi-scale framework. 
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In our last word will keep on defending the use of a multi-scale framework 
but it will not provide any definite clue about the superiority of either the linear 
or a non-linear scale-space. We opt on experimenting with more non-linear scale- 
spaces in order to reach a definite and generalized conclusion. 




Fig. 2. (a) Preoperative 3D Ultrasound; (b) Simulated intraoperative (De- 
formed) 3D Ultrasound; (c) The artificial deformation field. 




Fig. 3. (a) Difference between the original and the deformed volume; (b) Differ- 
ence between the original and the reconstructed volume; (c) MSE improvement 
wrt to multigrid levels at the finest spatial resolution. 
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Abstract. Length distributions can be estimated using a class of morphological 
sieves constructed with a so-called Rotation-Invariant, Anisotropic (RIA) mor- 
phology. The RIA morphology can only be computed from an (intermediate) 
morphological orientation space, which is produced by a morphological opera- 
tion with rotated versions of an anisotropic structuring element. This structuring 
element is defined as an isotropic region in a subspace of the image space (i.e. it 
has fewer dimensions than the image). A closing or opening in this framework 
discriminates on various object lengths, such as the longest or shortest internal 
diameter. Applied in a sieve, they produce a length distribution. This distribu- 
tion is obtained from grey-value images, avoiding the need for segmentation. 
We apply it to images of rice kernels. The distributions thus obtained are com- 
pared with measurements on binarized objects in the same images. 



1 □ Introduction 

The fraction of broken rice kernels in a batch is used to determine its quality. The 
milling process used to extract the kernels from their husk breaks a certain amount of 
them. Broken rice causes the consumer’s perception of quality to decrease, and so 
does the price. This makes it economically important to determine the fraction of 
broken kernels. 

Because manual counting is both expensive and subjective (different people appar- 
ently produce different results!), an automated system is required. A flatbed scanner is 
an ideal instrument to image rice, but it takes a lot of time to distribute the rice kernels 
on it in such a way that segmentation is possible. Therefore, we have applied a seg- 
mentation-free measurement technique to estimate the length distribution of kernels in 
an image, which can be used to derive the fraction of broken ones. It involves mor- 
phological filtering (RIA morphology) at different scales, from which a particle 
length distribution is obtained. The length of a kernel can be used to determine if it is 
broken or not. This multi-scale morphological filtering is called sieving. 

A sieve is a technique that builds a scale-space using a single morphological op- 
eration with a scale parameter. This operation has to be chosen carefully. The mor- 
phological operations that are allowed to be used in a sieve must satisfy three proper- 
ties: increasingness, extensivity and absorption [1]. In this scale-space, image features 
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are separated into different levels according to some size measure. The chosen mor- 
phological operation determines to what level each feature is assigned. 

Since our application requires the measurement of object length, we need a mor- 
phological operation that discriminates features based on their length. To this end, we 
have developed a morphological framework based on a structuring element that is 
isotropic in a subspace of the image, and thus anisotropic in the image space itself 
Since the structuring element has full rotational freedom, this framework is rotation- 
invariant. It also possesses most of the properties of regular morphology. We name it 
Rotation-Invariant Anisotropic (RIA) morphology. An RIA opening removes an ob- 
ject in the image if it cannot encompass the structuring element under any orientation. 
This allows the RIA opening to discriminate objects on their characteristic lengths 
(supposing convex objects). In the case of an ellipsoid, these would be the principal 
axes. On an A-dimensional (hyper-)ellipsoid, a 1 -dimensional structuring element 
finds the longest axis, a 2-dimensional one the second longest, etc. An A-dimensional 
structuring element is isotropic in the image space, and therefore has no rotational 
freedom; its usage reverts to regular isotropic morphology. 



2D The Sieve 

Morphological sieves were first proposed by Matheron [1]. They have been exten- 
sively used with both binary and grey-value morphology to measure particle-size 
distributions. Since a sieve has an increasing scale parameter, it results in a scale- 
space. Many theoretical studies have been made, linking it with linear scale-space 
theory and other non-linear scale-spaces (see for example Alvarez and Morel [2], or 
Park and Lee [3]). A sieve can be built with any closing or opening operation ^ that 
satisfies these three axioms [1]: 

• DExtensivity ('P(/)> f) or anti-extensivity ('P(/)< /), 

• □Increasingness (if f < g , then 'P(/)<'P(g)), and 

• □Absorption (if A > V then (/)) = (/)) = ^df) )• 

By definition, all openings and closings satisfy the first two axioms, but many do not 
satisfy the third one [1]. In the next section we will introduce a closing that we use in 
the application in Sect. 4, and which does satisfy all three axioms. In this section we 
illustrate the notion of sieving with a generic isotropic closing. 



2.1 The Closing Scale-Space 

We construct a (continuous) scale-space by closing {(p) the image at all scales 

rG (0,oo), 



F(x,r) = 0„(^)/(x) . 



( 1 ) 
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Each image F{x,ro) contains only dark features larger than tq. This is the closing scale- 
space. We define F{x,0) =J{x). Sampling at discrete scales r = s[i\ denoted by i g □ 
produces a sampled scale-space, 



For uniform scale sampling, 5[i] = / -I- 1. However, if the relative error should be kept 
constant, logarithmic sampling suffices. In this case, ^[z] = b' with b = 2*^", in which n 
denotes the number of samples per octave. 

Some structures contain different scales. Think about a telephone cable, composed 
of many bundles, each of which is made out of hundreds of thin wires. The wires are 
part of two structures at different scales. The morphological scale-space as described 
in this section is capable of finding both scales. 

2.2D Size Distributions 

The grey-value sum of each of the images in the closing scale-space generates a cu- 
mulative size distribution, which is rotation and translation invariant, since the closing 
is too [4]. By normalization, the cumulative distribution is made independent from the 
image size, contrast, and the fraction of objects. It is thus defined as 



where T’[v,o°] is the original image closed with an infinitely large structuring element, 
and is thus equal to an image filled with its maximum grey-value. 

2.3 □ Implementation Aspects 

When looking at the description of a sieve, it is obvious that image features composed 
of grey- value ramps will be separated into many scales. This can be dealt with by an 
appropriate pre-processing step (e.g. high pass-filtering, line or edge detection). 

Another important question is how to sample the scale-space. There is relatively 
little literature on this topic, and in most articles, one-pixel increments are used as a 
default solution. However, we believe it makes sense to use logarithmic sampling, 
since we might want to distinguish between 3-pixel features and 4-pixel ones, but not 
between 100-pixel features and 101-pixels ones. This causes the relative error to re- 
main constant across the scales. We will be using four samples per octave for the 
current application, which means that i'[/] = 






( 2 ) 



^F[x,/]-^F’[x,0] 




( 3 ) 
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3 □ RIA Morphology 

As stated in the introduction, this morphological framework is based on structuring 
elements that are isotropic in a subspace of the image, and thus anisotropic in the 
image space itself By allowing these structuring elements to rotate, we can create 
rotation-invariant operators. The operators in this framework that are comparable to 
the dilation and erosion are actually not a dilation and erosion in the strict morpho- 
logical sense, since they don’t distribute with the intersection and union, respectively. 
Therefore we will name them sedimentation and wear, two words with a similar 
meaning, but without the morphological connotations. The other two operations de- 
fined in this framework are the closing and the opening. 

In this section, we use 5 as the symbol for dilation, £ for erosion, and, as in the pre- 
vious section, (j) for closing. As subscripts to these, we provide its structuring element. 
Translation is also denoted with a subscript: f^{t) =f{t-x). 

3.1D Sedimentation and Wear 

By decomposing the dilation with an isotropic structuring element D with radius r 
into a union of dilations with rotated one-dimensional isotropic elements with 
radius r and orientation cp, we get 



Note that here tp is taken as a multi-dimensional orientation, or orientation vector. If, 
instead of taking the maximum over the dilations, we take the minimum, we get a new 
operator, which we will call RIA sedimentation. 



Here L can be any isotropic support with less dimensions than the image itself, and 
thus does not need to be a line. This operator takes the maximum of the image over 
the structuring element, rotated in such a way as to minimize this maximum. Fig. 1 
gives an example of the effect that this operator has on an object boundary. Note that 
a convex object boundary is not changed, but a concave one is. 

In the 2D case, in which T is a line, we can compare this sedimentation operator 
with a train running along a track. The train wagons (which are joined at both ends to 
the track) require some extra space at the inside of the curves. This sedimentation, 
applied to a train track, and using a structuring element with the length of the wagons, 
reproduces the area required by them. Note that this analogy is only true if the length 
of the structuring element is small compared to the curvature of the boundary. This is 
always true for a train track, but not necessarily so for a grey-value image. 

By duality, one can define the RIA wear as the maximum of a set of erosions with 
rotated line segments. 



5J = f®D = f®[]L^ = ^5,J 



(4) 










(5) 
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a b c 

Fig. 1. Effect of the RIA dilation on an object boundary. The white line in b represents the 
original object boundary. In c, its construction 



3.2 □ Opening and Closing 



The closing is usually defined as a dilation followed by an erosion. However, it is 
easier to understand (and modify) if we see it as the maximum of the image over the 
support of the structuring element D, after shifting it in such a way that it minimizes 
this maximum, but still hits the point t at which the operation is being evaluated. Or, 
in other words, the ‘lowest’ position we can give D by shifting it over the ‘landscape’ 
defined by the function/ 







f V 




0u/ = A94 


= A 
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= £d^d/ 


xeDy^D^ 
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In accordance to this, we define a new morphological operation, RIA closing, as the 
‘lowest’ position we can give the subspace structuring element L, by shifting and 
rotating it over the ‘landscape’ / such that it still hits the point x being evaluated. It is 
defined by 

f = /\/\ Q fy ■ (7) 



This turns out to be the same as the minimum of the closings, at all orientations, with 
the structuring element L (but not equal to an RIA sedimentation followed by an RIA 
wear). 



^r/ = AA 

V Jter, 



9 4 



<p (p 



( 8 ) 



According to Matheron, this operation is an algebraic closing since it is an intersec- 
tion of morphological closings [1]. This implies that extensivity, increasingness and 
absorption are satisfied, and they can be used in a sieve. The two-dimensional case is 
an intersection of closings with rotated lines, which have been used before (see for 
example Soille [4]), and we will use in the next section. 
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By duality, one can define the RIA opening as the maximum of openings with ro- 
tated line segments. 



3.3DMorphological Orientation Space and RIA Morphology 

The morphological operation \|/ by rotated versions of an anisotropic structuring ele- 
ment L can be used to construct a morphological orientation space 

f^L{x,y,(p) = 'V ij{x,y) ■ (9) 

The RIA sedimentation and closing now result from a maximum projection along the 
orientation axes, 

'V°Lf{x,y) = J\U,L{x,y,(p) ■ ( 10 ) 

9 

The RIA wear and opening result from a minimum projection. In a sieve, this orienta- 
tion space would be extended with a scale dimension. 



4D Length Measurement of the Rice Kernels 

Fig. 2 shows two images of rice kernels obtained by placing the rice on a flatbed 
scanner. The image on the left has all kernels manually separated before acquisition, 
which takes about 15 minutes. The one on the right contains the same kernels ran- 
domly scattered on the scanning surface. As stated before, it is not trivial to correctly 
segment such an image. Thus, the classical measuring paradigm (threshold, label, 
measure the segmented objects) is not easily applied. 




Fig. 2. Two images of rice kernels. The image on the left has been made after carefully sepa- 
rating all kernels to make segmentation easy 
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In total we have 10 images of the same sample, 20% of which consists of broken 
kernels: 

• □two images with only the broken kernels (one touching, one separated), 

• □two images with only the intact kernels (again, one touching, one separated), and 

• □six images with all kernels (two touching, four separated). 

Accurate and reliable measurements can be obtained by applying the sieve with an 
RIA opening to an image with all kernels separated (as the image on the left of Fig. 
2). However, that is the easy problem. An image with touching kernels will produce 
an over-estimation of the particle sizes, since groups of rice kernels can accommodate 
larger line segments than single rice kernels. The solution would be to use ‘thick 
lines’ (ellipses). If these are thick enough not to fit through the union point between 
touching rice kernels (which is usually thinner than the kernels themselves), the 
measurements produce the same results as on the first image. 



4.1 □ Preprocessing 

To increase the accuracy of the measurements we do some preprocessing on the im- 
ages (see Fig. 3 for the results of these steps). The goals are twofold: 

1 . Oemove imaging artifacts, and 

2.0emove kernels that are thinner than the structuring element to avoid an underesti- 
mation of the lengths. 




Fig. 3. Preprocessing of the images: first, an opening removes thin elements, which are not 
counted in de length distributions (middle). Then, an error-function clip is applied (right) 

Because we use thick line segments as structuring elements, all kernels and por- 
tions of kernels that are thinner than these will be put into the smallest scale of the 
granulometry. To overcome this we remove these features by an opening with a disk 
of diameter equal to the width of the line segments. Since very few rice kernels are 
too thin, removing them introduces only a very small imprecision in the measure- 
ments. The thinner portions of the kernels that are also removed cause these to be 
somewhat shorter. This yields a systematic error, an average underestimation of the 
lengths of four pixels (result obtained experimentally). This causes a shift to the left 
of the cumulative length distribution. This error would, however, also be produced by 
the introduction of thick line segments (in addition to an overestimation of the small- 
est scale). 
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The second operation that is applied to the images is an error-function clip. This is 
a clipping that introduces less aliasing than hard clipping [5]. Its need is two-fold; 
removing noise in the background, and equalizing the grey-value over the rice ker- 
nels. Some of these contain a chalky portion, caused by an unbalanced growing proc- 
ess. This chalky portion is imaged whiter than the rest of the kernel, and would influ- 
ence the length distribution by adding weight to the smaller scales. 



4.2 □ The Classical Measuring Paradigm 

To compare our results with those obtained with an existing algorithm, we measured 
the length distribution using the Feret length measure [6] on the thresholded and seg- 
mented image. This works well on the images where the kernels have been manually 
separated before acquisition, but produces poor results on the images with touching 
kernels. The algorithm we used to determine the Feret length uses a chain-code repre- 
sentation of the object boundary, which can be easily rotated. The longest projection 
of the boundary is used as the object length. 



4.3 □ Results 

The length distributions of the two images in Fig. 2, obtained by the proposed sieve as 
well as the Feret length, are plotted in Fig. 4. The results for both images using the 
sieve are almost identical and only slightly different from the measurement obtained 
using the classical method applied to the image with separated kernels. However, the 
classical method applied to the image with touching kernels produces a very large 
over-estimation of the sizes. Fig. 5 shows the results obtained by the sieve on the ten 
images. In all cases, the sieve applied to the images with touching kernels produces 
only a minimal overestimation of the kernel length. 




Fig. 4. Comparison of the classical segment 
and measure method, and the sieve with the 
RIA opening. For the latter, touching rice 
kernels do not influence the measurement 
very much. Note the logarithmic scaling of 
the horizontal axis 




Fig. 5. Cumulative distribution measured for 
the images. This figure shows that it is easy 
to measure the fraction of broken kernels in 
this way. The difference induced by the 
contact between rice kernels is very small 
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5D Conclusion 

For the application discussed in this article, as well as many other applications, seg- 
mentation is very difficult or not possible at all. Segmentation-free measurement 
techniques are therefore desirable. Sieves, a form of multi-scale morphological filter- 
ing, are very useful in this context. Sieves produce size distributions from grey-value 
images, and the measure for the size of the image features is determined by the cho- 
sen morphological operation. Since this application requires length measurement, RIA 
openings have been used in a sieve to obtain length distributions. 

RIA (Rotation-Invariant Anisotropic) openings are the openings in a new morpho- 
logical framework that results from decomposing an isotropic structuring element into 
rotated lower-dimensional isotropic structuring elements. An RIA opening only re- 
moves an image feature if the chosen structuring element does not fit under any ori- 
entation. 

The proposed sieve is applied to measure the length distribution of rice kernels ac- 
quired with a flatbed scanner. These were scattered quickly onto the scanning surface, 
so that many are touching. To minimize the influence of the touching kernels, we 
have modified the RIA openings slightly, using line segments of certain width, in- 
stead of using one-pixel thin line segments. With this modification, the obtained dis- 
tributions are almost identical for the images with separated and touching kernels. In 
contrast, the classical measuring paradigm (which uses a threshold, segmentation of 
the objects, and measuring the length based on these binarized shapes) produces in- 
correct results for the image with the touching kernels. 
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Abstract. In recent years, the use of multimedia content has expe- 
rienced an exponential growth. In this context, the need of new im- 
age/video sequence representation is becoming a necessity for many ap- 
plications. This paper deals with the structuring of video shots in terms 
of various foreground key-regions and a background mosaic. Each key- 
region represents different foreground objects that appear through the 
entire sequence in a similar manner the mosaic image represents the 
background information of the complete sequence. We focus on the in- 
terest of morphological tools such as connected operators or watersheds 
to perform the shot analysis and the computation of the key-regions and 
the mosaic. It will be shown that morphological tools are particularly at- 
tractive to improve the robustness of the various steps of the algorithm. 



1 Introduction 

Images and video sequences modeling is experiencing important developments. 
Part of this evolution is due to the need to support a large number of new mul- 
timedia services. Traditionally, digital images were represented as rectangular 
arrays of pixels and digital video was seen as a flow of frames. New multimedia 
applications can rely on indexing or content-based coding that allow a represen- 
tation that is more structured and hopefully closer to the real word. 

The most straightforward way of representing video shots is to consider them 
as a set of contiguous frames. An alternative approach is to represent them by a 
subset of representative frames called key- frames. A more sophisticated approach 
for shot representation involves the analysis of the spatio-temporal content of 
the video shot. In |5| and |Z], for instance, the representation of a video shot is 
composed of a set of layers representing the background information and vari- 
ous foreground layers. An attractive background representation relies on mosaic 
images m Mosaics are panoramic views of the background components that 
are visible during the shot PE] Mobile foreground objects can then be superim- 
posed to the mosaic representation. In the sequel, these foreground objects will 
be represented by key-regions. A typical example of shot representation based on 
background mosaic and key-regions is shown in Fig. P The background mosaic 

* The authors would like to thank the support of the European Commission and in 
particular, the ACTS DICEMAN Project. 
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a) Background Mosaic 
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b) Key-region 1 (girl) c) Key-region 2 (car) 



Fig. 1. Video shot representation with a background mosaic a) and two key- 
regions b) and c). Key-regions are represented from left to right by an appearance 
image A^r, a contour image Ckr and a texture image T^r- 



is presented in Fig.Qa and two key-regions are shown in Fig.Qlb and [He. Each 
key-region is represented here by an appearance image a contour image 
and a texture image where kr stands for key-region and k the key-region 
number. The meaning and computation of these images will be presented in this 
paper. Note that the motion trajectories of the key-regions are also drawn (as 
white lines) on the background mosaic. 

The extraction of foreground regions in video sequences is an active research 
topic. Classical approaches m mainly rely on motion information. However, 
pure motion-based algorithms fail when shots present rapidly changing back- 
grounds, when foreground objects present little motion with respect to the cam- 
era or when foreground objects have a low contrast with respect to the back- 
ground. The shot representation technique proposed in this paper builds, in a 
first step, a background mosaic and then uses this mosaic to extract key-regions. 
Beside the explanation of the complete algorithm, the main focus of this paper 
is to highlight the use of morphological tools such as connected operators 0 and 
watersheds to improve the robustness of the algorithm m- 

This paper is organized as follows. Section|3 gives an overview of the proposed 
algorithm. Section |3 presents the use of motion-oriented connected operators 
for outliers detection in the mosaic creation algorithm. Section 0 explains the 
foreground segmentation algorithm and sectionElthe creation of key-regions. The 
representation and modeling of key-regions are discussed in section El Finally, 
conclusions are drawn on section Q 

2 Overview of the Algorithm 

The algorithm is highlighted in Fig. 0 and involves three steps. The first one 
is the background mosaic computation (top blocks of Fig. 0). The second step 
extracts the shape of each key-region at each time instant (middle blocks) and 
the last step combines the information obtained at each time instant and builds 
the key-region models (bottom blocks). Next sections will describe each step. 

The background mosaic computation follows a classical approach 0. The 
first step is to compute the dominant motion between successive input images, 
I{t) and I{t— 1). The dominant motion, m{t), is assumed to represent the camera 
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Fig. 2. Overview of the algorithm. Blocks in gray represent steps where morpho- 
logical tools are used (only the major input and output signals are represented). 



motion and is used to warp the original frames in the same coordinate system. 
The warped images are blended to produce the mosaic image, Imos- In order to 
be robust, the blending should only take into account pixels belonging to the 
background. As a result, before the warping and blending step, outliers that 
do not follow the dominant motion are identified. They are represented by an 
outliers mask Mout (t) . In section 0 we will show how morphological connected 
operators efficiently allow the identification of outliers. 

The second step extracts, for each key-region k at time t, a key-region mask 
This extraction starts by the mosaic alignment. Its goal is to produce an 
estimation of the background information, Ib{t), at time t. Taking into account 
the dominant motion, the relevant part of the mosaic is un-warped to be com- 
pared to the current image I(t). A foreground mask, Mfor(t), is computed by 
comparing the original image I{t) with the background estimation Ib(t). A wa- 
tershed algorithm (section 0 is used for this step. The foreground mask Mfor{t) 
is an estimation of the key-regions at time t. However, this estimation is not 
very reliable because it is obtained on the basis of the observation of a single 
time instant. To improve the robustness of the analysis, the last step combines 
the contour information of the foreground masks extracted at each past time 
instant and selects the most reliable sections that have been observed to create 
the mask of the key-region, A watershed algorithm can be also used to 

combine a set of contours taking into account their reliability (section 0. 

Finally, the last step of the algorithm takes into account the key-region masks, 
well as the original image, I{t), to update the key-region models 
(see section El for more details). In the following sections, we explain the use of 
morphological tools for the outliers estimation (section E|), the foreground mask 
estimation (section EJ and the key-region mask estimation (section 0 . 
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3 Outliers Estimation with Connected Operators 

Morphological connected operators are used to detect and remove outliers that 
do not follow the dominant mosaic motion in the mosaic creation step. Gray level 
connected operators are operators that act by merging elementary regions called 
flat zones They cannot create new contours or modify the position of existing 
boundaries between regions and, therefore, have very good contour preservation 
properties. Several approaches can be used to create connected operators. We 
will use the one discussed in [3j . The strategy consists in creating a region-based 
tree representation of the image and to apply a pruning strategy on the tree to 
simplify the image (in this case, without the outliers). 

The tree representation is called Max-tree and is oriented towards signal 
maxima. Each node A/) in the tree represents a connected component of the space 
that is extracted by the following thresholding process: for a given threshold 
value T, consider the set of pixels X that have a gray level value larger than T 
and the set of pixels Y that have a gray level value equal to T : 

X = {x, such that f{x) >T} and Y = {s, such that f{x) = T} (1) 

The nodes A/) represent the connected components of X such that AT P| F 0. 

The filtering strategy consists in pruning the tree and in reconstructing the 
image from the resulting pruned tree. The simplification is governed by a crite- 
rion which may involve simple notions such as size, contrast or more complex 
ones such as texture, motion or even semantic criteria. Here, the detection of 
outliers is based on a motion criterion. For all input frames, the corresponding 
max-tree is created. A recursive version of the mean displaced frame difference is 
computed for all nodes of the trees using the dominant mosaic motion m(t) P|. 
Nodes of the tree that do not follow the given motion produce a high displaced 
difference and should be removed. The criterion is not increasing: there is no 
constraint stating that if a node has to be removed, its children have also to 
be removed. Therefore, a dynamic programming strategy based on the Viterbi 
algorithm is used. We refer the reader to 0 for a complete description of the 
max-tree creation and the morphological filtering involved. 




d) Dual filtering e) Dark outliers 



Fig. 3. Estimation of outliers with connected operators 
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Fig.0 shows an example of outliers estimation. Connected operators remove 
maxima of the image that do not follow the dominant mosaic motion. Fig. |3b 
and|3c show an original frame and the output of the connected filter. The filter 
has removed the bright components of the outliers (the girl and the car) and has 
preserved the background information. Comparison between the original and 
filtered frames gives the mask corresponding to bright outliers. The estimation 
of dark outliers can be done using the dual connected operator. The dual operator 
tjj* is defined by: and has the same effects as 4’ but on minima. 

Fig. Ole and 01 f show the filtered output and the mask corresponding to dark 
outliers. The final outliers mask is shown in Fig. 01 g- 

On the other hand, classical mosaic creation algorithms try to remove outliers 
by defining a map assigning to each pixel a value representing whether it belongs 
to the foreground or to the background. The classical value assigned to each pixel 
of the weight map image is: 

^ c+\I{t)[x]-I{t-l)[x-m{t)[xW 




Fig. 4. Comparison of mosaic creation without a) and with b) connected oper- 
ators. 

Fig. 2)compares the classical solution with the one proposed using connected 
operators. The classical approach does not allow the elimination of outliers that 
occupy a significant portion of the image (as the girl). A dark shadow is clearly 
visible in the lower right part of the mosaic of Fig. 0 Moreover, the partial 
elimination of outliers has a strong effect on the successive warping and blending 
steps: strong geometrical deformations appear on the lower right part of Fig. 0a. 



4 Foreground Mask Estimation with Watershed 

The foreground mask extraction process is outlined in Fig. 0 Using the dominant 
motion, the relevant part of the mosaic is un- warped to produce an estimation 
of the background Ib{t) at time t, that can be compared to the current image 
I{t). All the relevant information is concentrated in the image: I(t) — Ib{t). The 
foreground mask, Mfor{t), is computed by using a watershed algorithm 0. The 
watershed algorithm is applied on a gradient image and uses markers to initiate 
the propagation process. 

The gradient image should indicate the contours of the foreground mask. It is 
mainly computed from the image gradient of I{t) — Ib{t). However, this gradient 
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Original frame H t) Gradient image 



Fig. 5. Estimation of the foreground mask 



highlights contours but also textured areas. To solve this drawback, the gradient 
is weighted (pixel by pixel) by a temporal gradient: Q{I{t) — I{t — 1)}, where G 
denotes the gradient operator: 

G{t) = G{I{t) - hit)} ■ iG{Iit) - I{t - 1)} V Go) (3) 

where V denotes the maximum and Go is used as lower-bound of the weighting 
gradient so that the weight is not too low on static areas. 

Markers are obtained by thresholding \I{t) — Ib{t)\ and by erosion of the re- 
sulting masks. Two different thresholds, tfor and hack, are used to extract fore- 
ground and background makers. Assume that es{-} denotes a binary erosion with 
a structuring element, s. The foreground and background markers are defined 
by Mforit) = {|/(t) - hit)\ > tfor] and Mback{t) = - h{t)\ < hack} 

respectively. The threshold values were empirically chosen to be tfor = 35, 
hack = 10 and s an square structuring element whose length is 2 per cent 
of the original image size. Results have shown these values to be very robust 
even across different type of sequences. Foreground and background markers are 
combined in a single image called Marker image in Fig. O In this image, the 
dark (grey) areas correspond to foreground (background) markers. 

Finally, the watershed is applied to the gradient image G{t) using the mark- 
ers, Mfor{t) and Mback{t)- A final step groups all connected regions into the 
same connected masks and considers non-connected regions as different fore- 
ground regions. The segmentation can be seen on the right side of Fig. El where 
the girl has been successfully segmented from the background. 

5 Key-Region Mask Definition with Watershed 

The foreground mask Mfor{t) is an estimation of the key-regions at time t. 
However, this estimation is not very reliable because it is obtained on the basis 
of the observation of a single time instant. To improve the robustness of the 
analysis, the key-region mask estimation step combines the contour information 




Morphological Tools for Robust Key-Region Extraction 413 



of the foreground masks extracted at past time instants and selects the most 
reliable sections to create the mask of the key-region k, 

The first step of the algorithm is to associate connected components of the 
background mask Mforit) to key-regions that are already stored in the key- 
region memory. A connected component of the foreground mask is assigned to 
an existing key-region if it sufficiently overlaps with the last assigned foreground 
mask of the corresponding key-region. This approach works well on common 
scenes where changes between frames (at 25 or 30 fps) are usually small. If the 
current foreground mask does not correspond to any known key-region, a new 
key-region is created. 

Once a connected component of the foreground mask is assigned to an exist- 
ing key-region, it should be aligned to the same coordinate system. This align- 
ment is performed by estimating the motion between the foreground mask and 
the stored key-region. After alignment, let us denote by Mforit) and I{t) the 
motion compensated version of the foreground mask and the motion compen- 
sated input image. These images can be seen on the left side of Fig. El Note 
that, in this example, the contour of the foreground mask is not always reliable. 
Our goal is to combine this contour information with the contour information 
of the same key-region extracted at previous time instants taking into account 
the reliability of contours in time. The update of the foreground shape starts 
from the compensated foreground mask Mfor(i) and is performed as follows. 
Assume that / is an image and M a mask, C{/, M} denotes an image equal to 
zero except on the contours of M where it takes the values of I. The contour 
reliability of the foreground mask Mfor{t) is obtained by: 

Cforit) = c{g{m},MfoAt)} (4) 

The pixels value of this contour image is a confidence measure of the con- 
tours of the current foreground mask Mforii)- Low values imply that the corre- 
sponding contour does not correspond to contrasted edges. This can occur, for 
instance, when the foreground occludes a background region of the same color. 
High values on the contour measure correspond to strong edges on the original 
image and therefore to reliable contours. Fig. EliHustrates the use of the contour 
image to correct possible segmentation errors in the foreground mask extraction 
algorithm. In this example, the foreground mask extracted at frame 2039 of the 
nhkvideo7 sequence of the MPEG-7 database is of poor quality due to a low 
contrast between the girl and the background in that specific time instant. 

The two images on the left side of Fig. Elshow the extracted foreground mask 
Mfor{t) and the measure of contours reliability Cfor{t) (as in Equ. (@J). This 
measure of the contour confidence on the current foreground mask is compared 
with the same accumulated measures from the previous foreground masks as- 
signed to the same key-region, — 1). This accumulated contour image is 

part of the key-region model (see section Ej) . The corresponding accumulated 
contours measure of the key-region is shown on the bottom image of Fig. El 
The combination of the current contour Cfor(t) and the accumulated contours 

— 1) is done by a maximum operation: 

Cu{t) = aCfoAt) V (1 - a)Ct(t - 1) 



( 5 ) 
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Fig. 6. Creation of the key-region mask taking into account the reliability 

of the current foreground mask Mfor{t) and of past contour information — 

I). 



The parameter a € [0,1] controls the memory of the allowed modifications 
to the shape of extracted foreground masks. If a ~ 0, previously segmented key- 
region contours are trusted more than the current contours from the foreground 
mask. In this case, errors in the foreground mask are easier to fix but tracking 
non-rigid foreground regions becomes more difficult. On the other hand, if a ~ 1, 
non-rigid regions are easier to track but segmenting errors are also more difficult 
to correct. In our case, a value of a = 0.5 has been used for all examples. Note 
that resulting contour values of Cu (t) are only used locally as the gradient image 
for the watershed of the key-region mask definition (see Figure 0) so the implied 
lowered of the gradient when using a < 1 is not propagated on following frames. 

The estimation of the final mask of the foreground region: is done with 

a watershed algorithm. The markers for this watershed consist of two points, one 
inside the foreground mask and one outside (in the background). The output 
of the watershed algorithm is the new foreground mask (t) where the most 
reliable contour parts from the foreground mask and from the assigned key-region 
have been used. The resulting mask is shown on the right side of Fig. 0 The 
initial error in the foreground mask shape has been eliminated and replaced by 
the most reliable contour observed in the past. In general, this procedure allows 
the progressive improvement of the key-region contours on a frame by frame 
basis taking into account the reliability of past extracted key-region contours. 

6 Key-Region Modeling 

The final step of the algorithm creates and updates a model for each key-region 
observed in the scene (Key-region update block of Fig.|2D. The key-region model 
consists of a template of three images. An appearance image, a contours image 
and a texture image. The appearance image shows the frequency with 
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Fig. 7. Modeling of key-region k with, from left to right, an appearance image 
a contour image and a texture image 



which a pixel has been estimated as belonging to key-region k. The contour image 
Cfcr (^) stores the confidence of the key-region contours and is used to modify the 
input foreground masks in a frame basis as seen in the previous section. Finally, 
the texture image represents the overall texture of the key-region. 

If the key-region mask = 1 denotes pixels that have been extracted 

and assigned to key-region k at time t and, Cfor(t) is the contour confidence of 
the extracted foreground mask (as in Equ. 0 )). The equations that update each 
template image are (pointwise operations are implied): 

+ AUt-l)CUt-l) + CfoAt) 

nAt) - c,At) - 

= aIa* - 1) -e Mt(t) (6) 

Fig. □ shows the key-region template images from a scene where a person 
walks in front of the camera. The appearance, contour and texture template 
images contain information of the activity followed by the key-region. In this 
case, higher body parts (body, chest) show no relative movement while lower 
parts (legs) show a considerable amount of relative motion. This representation 
is particularly attractive to analyze the activity of non-rigid regions. 

Fig.n shows a complete shot representation of the nhkvideo7 sequence. The 
background information is separated from the key-regions of the scene. In the 
original sequence, the camera follows the walking girl while a car crosses the 
road in the background. Two key-regions have been extracted corresponding to 
the girl and the car. Fig. Qb anddc show the corresponding template images of 
the two key-regions. Superimposed to the final mosaic image, the relative motion 
respect to the camera is drawn. 

7 Conclusion 

A method for representing and structuring video shots has been presented. A 
robust outliers detection algorithm based on connected operators is used to esti- 
mate and create a mosaic image of the background information of the scene. This 
background information can then be used to extract representative foreground 
key-regions that appear in the shot. The proposed approach uses a watershed 
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algorithm to extract the foreground mask on a frame by frame basis. These 
foreground regions are refined using the reliability of previous extracted con- 
tours and are progressively combined into key-region templates. At this step, 
the watershed algorithm turned out to be again an attractive solution. Both 
key-regions templates and mosaic image create a compact and useful represen- 
tation of the content and of the activity of the scene allowing the possibility of 
further representation, indexing and analysis of the shot. 
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Abstract. In 3D and 4D digital image analysis, deformable objects are 
considered for object recognition and shape analysis. In this paper, we 
study for an algebraic framework for 3-dimensional object deformation 
in a discrete space by defining polyhedral sets and their set operations. 



1 Introduction 

In digital image analysis, not only still objects but moving/deforming objects are 
currently considered for object recognition and shape analysis. For instance, we 
use deformable object models for object segmentation from digital images such as 
active contours object deformation description of a sequence of digital images 
P], and thinning and skeletonization for topological-based object analysis 0. 

In this paper, given a sequence of 3-dimensional(3D) digital images, we con- 
sider to obtain a sequence of the shape features via a sequence of polyhedral 
representation of a deformable object. We use polyhedral object representation 
because we consider geometric and topological shape analysis for the final out- 
puts. In this paper, we focus on the first stage in which we obtain a sequence 
of deformable polyhedra from a sequence of 3D digital images. We make an 
algebraic framework for 3-dimensional object deformation in a discrete space 
by defining polyhedral sets and their set operations in a 3-dimensional discrete 
space. The efficient extraction algorithms for topological features from a sequence 
of deformable polyhedra are presented in . 

There are two different representations for polyhedral sets: a set of all points 
in a polyhedral region in the 3-dimensional Euclidean space and another is a 
set of polygons which are the boundary of a polyhedral region. We choose the 
boundary representation to obtain the advantage of data compression. Algebraic 
consideration on polyhedral set operations with such boundary representation 
are studied in the fields such as computer-aided design and mathematical 
morphology to obtain the guarantee of correct results by operations; if 
polyhedral sets are algebraically closed, we can obtain such guarantee. In this 
paper, we have a similar algebraic consideration on our polyhedral sets and set 
operations for 3D and 4D digital image analysis. 

* The current address of the first author is Laboratoire A2SI, ESIEE, Cite Descartes, 
B.P. 99, 93162 Noise-Le-Grand Cedex, France thanks to JSPS Postdoctoral Fellow- 
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2 Primitives of Discrete Polyhedra 

Let Z be a set of the integers. Then, is a set of points whose coordinates are 
all integers in the three-dimensional Euclidean space R^. Those points are called 
lattice points and Z^ a lattice space hereafter. In this paper, we define the set 
of all polyhedra in Z^ which we call discrete polyhedra such that their vertices 
are all lattice points and neighboring to each other in the sense of classical 
neighborhood systems such as 6 -, 18- and 26-neighborhood in Z^ j^. For any 
point X gZ^, they are defined such as 

Nm{x) = {y G : \\x - y\\ < Vi} 

where t = 1, 2, 3 for m = 6, 18, 26, respectively. This section is devoted to define 
the set of primitives of discrete polyhedra. We show that the finite number of 
primitives are obtained due to the above constraints of discrete polyhedra. 

2.1 Construction for Primitives of Discrete Polyhedra 

Let us consider a unit cubic region which contains eight lattice points including 
X = {i,j, k) such that 

D(a;) = {(*-l-ei,j-l-e 2 ,A:-|-e 3 ) G Z^ : e/ = 0 or 1, ^ = 1, 2, 3}. 

First, let us consider that each point in D(-) has either one or zero of its value 
and the point is called a 1- or 0-point, respectively. The number of possible 
arrangements of 1- and 0-points in D(-) will be 23 as shown in Table [Dof the 26- 
neighborhood, if we omit the arrangements which differ from those by rotations 
for counting. For each of the 23 arrangements, a convex hull of all 1-points Xi, 
X2, . . ., and Xp is constructed in D(-) such that 

p p 

, 3?2 , . . . , Xp} ) — }x G R. . X — ^ ^ j ^ ^ Xi — 1, Xi ^ 0} . 

i=l i=l 

The dimension of CH({xi, X 2 , . . . , Xp}) depends on the number and arrange- 
ment of 1-points in D(-). For instance, CH({xi, X 2 , . . . , Xp}) becomes an 0- 
dimensional isolated points when p = 1 , a 1 -dimensional line segment when p = 
2, and a 2-dimensional triangle when p = 3. When p = 4, CH({xi, X 2 , . . . , X 4 }) 
becomes a 2 -dimensional rectangle if Xi, X 2 , . . ., X 4 lie on a plane, and oth- 
erwise a 3-dimensional tetrahedron. When p > 5, CH({xi, X 2 , . . . , Xp}) be- 
comes a 3-dimensional polyhedron with p vertices. Note that any 1-point Xi 
for i = 1, 2, . . . ,p becomes the vertex of CH({xi, X 2 , . . . , Xp}). We then classify 
every CH({xi, X 2 , . . . , Xp}) into the m-neighborhood system for m = 6, 18, 26, 
if any pair of the adjacent vertices of CH({xi, X 2 , . . . , Xp}) are m-neighboring to 
each other. Tabled shows the classification results of all CH({xi,X 2 , . . . ,Xp}) 
with respect to both the dimensions and neighborhood systems. 

In Table d we see that 3-dimensional primitives for the 26-neighborhood 
system are constructed whenever more than three 1 -points do not lie on a plane, 
and that 2 -dimensional primitives are constructed whenever more than two 1 - 
points lie on a plane. In the case of the 18-neighborhood system, 3-dimensional 



Polyhedral Set Operations for 3D Discrete Object Deformation 419 



Table 1. All primitives of n-dimensional discrete polyhedra for the 6-, 18- and 
26-neighborhood systems where n = 0, 1, 2, 3. 




primitives are not constructed for the arrangements P4b, P4c, P4d and P5a 
because any pair of 1-points such that the Euclidean distance between them 
is \/3 are not considered to be neighboring. In the case of the 6-neighborhood 
system, a 3-dimensional primitive is constructed only for the arrangement P8. 

2.2 Formulations of Unit Discrete Polygons and Polyhedra 

In this paper, we focus on the primitives of n = 2, 3. They are called unit discrete 
polygons and polyhedra for n = 2, 3, respectively. 

Definition 1. If the convex hull of p 1-points X\, x^, ■ ■ ■, Xp in D(-) has two 
dimensions and any pair of adjacent vertices are m-neighboring to each other 
for m = 6, 18,26, then the set of p 1-points is called an unit discrete polygon for 
m-neighborhood system and denoted by 

Sm — 1 ^p\ • 

Let Qm be the sets of all unit discrete polygons for each m = 6, 18, 26. We 
consequently obtain the following relations from TableQ: 

Qe C ^18 C t/26 • (1) 

From Table 0 we also see that any unit discrete polyhedron is bounded by a 
set of unit discrete polygons for each m = 6, 18, 26 as any unit discrete polygon 
is surrounded by a set of one-dimensional primitives which are line segments. 
If such unit discrete polygons are called the faces of a unit discrete polyhedra, 
then the definition of unit discrete polyhedra is given as follows. 
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Definition 2. If the convex hull of p 1-points Xi, X 2 , ■ ■ ■, Xp in D(-) has three 
dimensions and any pair of adjacent vertices are m-neighboring for to = 6, 18, 26, 
then a unit discrete polyhedron is defined by 

P™ = {S^,S^,...,SU (2) 

where S^, S^, . . are unit discrete polygons which are the faces ofPm- 

We set each = {x\,X 2 , . . . , Xp} for i = 1, 2, . . . , g to be oriented to the 
exterior of Pm- Such orientation of Sm is represented by the order of points 
such that the order, x\, X 2 , ■ ■ ■ , x^, is the counterclockwise if we see it from a 
viewpoint outside Pm- Let Hm be the sets of all unit discrete polyhedra for each 
TO = 6, 18, 26- Similarly to Q, we then obtain 

Tie C His C 7^26- 



3 Set Operations 

Our goal of this paper is to define the set of all polyhedra in Z^, which we call 
discrete polyhedra, and to represent any deformation of such discrete polyhedra 
by their boolean operations. Since a boolean operation is also used for the recur- 
sive definition of discrete polyhedra as shown in the next section, we first give 
the definition of boolean operations in this section. 

Let Vm be the family of sets of oriented unit discrete polygons for m- 
neighborhood system where to = 6, 18, 26. We define set operations for Vm 
which will be a larger set than the set of discrete polyhedra. For any pair of 
finite sets A and B in Vm, we define the addition operation using the notations 
of U and \ for the union and difference sets, such that 

A + B = (A\Xa(B))U(B\Xb(A)) (3) 

where 

Xa(B) = {S e a : S = T ^ for any T € B} , 

Xb(A) = {T e B : T = S-i for any S e A} . 

The notation represents a unit discrete polygon whose orientation is opposite 

to that of S, and the relation S = T indicates that S is the equivalent oriented 

unit discrete polygon to T. 

If we set the empty set as the neutral element such that 

A + 0 = 0-kA = A 

for any A G Vm, then we obtain the inverse element —A such that 

A+(-A) = (-A)+A = 0 

and thus —A = : S G A}. The orientation of every unit discrete polygon 

in — A is opposite to that in A. Note that some elements in Vm have their unit 
discrete polygons whose orientations are mixed so that some of them are to the 
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P + Q = (P U Q) \ {S, T} 



Fig. 1. An example of addition between two unit discrete polyhedra P and Q 
for the 26-neighborhood system. 



exterior and the others are to the interior. From inverse elements, we define the 
subtraction of B from A such that 

A-B = A-b(-B) = (A\B)U(-(B\A)) . (4) 

Let us consider the addition of two unit discrete polyhedra P and Q in Vm ■ 
Figuredillustrates an example of P-f-Q in the case of the 18- or 26-neighborhood 
system. After combining P and Q at their common faces of S S Xp(Q) and 
T € Xq(P) such that S = we exclude both S and T from the union of P 
and Q and obtain P -1- Q. 

4 Discrete Polyhedra 

4.1 Recursive Definition of Discrete Polyhedra 

Similarly to the representation of unit discrete polyhedra given by O, we have 
the boundary representation for discrete polyhedra which are given as sets of 
unit discrete polygons. By using the addition operation of 10, the set of discrete 
polyhedra for m-neighborhood system is defined from the finite set of unit dis- 
crete polyhedra for m-neighborhood system where m = 6, 18, 26. Assuming that 
each unit discrete polygon constructing a discrete polyhedron is oriented to the 
exterior such as (a) and (b) in Figure 0 we consider operations A -|- B such as 
Figure 0(c) and A — B such as Figure 0(d) but do not consider A — B such 
as Figure 0 (c) and A -|- B such as Figure El (d) for the construction of discrete 
polyhedra. To avoid such operation for constructing every discrete polyhedron, 
we use the function W(P„) for any addition of unit discrete polyhedra. 




(a) (b) (c) (d) 



Fig. 2. If two sets of discrete polygons (a) A and (b) B in Pm are given, we 
consider the boolean operations A -|- B such as (c) and A — B such as (d) for 
construction of discrete polyhedra, but do not consider A — B such as (c) and 
A -I- B such as (d) . 



422 



Yukiko Kenmochi and Atsushi Imiya 



Table 2. Conceivable adjacent relations between two unit discrete polyhedra. 
Relations (a), (b) and (c) represent the adjacencies when two unit discrete poly- 
hedra have the common vertex, edge and unit discrete polygons S and T such 
that S = T~^, respectively. In the case of relation (d) and (e), two adjacent unit 
discrete polyhedra have the unit discrete polygons S and T at the joint, but S is 
not equivalent to All examples are for the 18- or 26-neighborhood system. 




Definition 3. Discrete polyhedra are recursively constructed for each m = 
6, 18, 26 as follows; 

1. a unit discrete polyhedron becomes a discrete polyhedron Pm o,nd we set 

W(P^) = {D(a;) : U S C D(a:)} ; 

SGPtTT, 

2. if Pm o,nd Am are a discrete polyhedron and a unit discrete polyhedron re- 
spectively such that they satisfy the followinq conditions: 

(a) W(P^)nW(A^) = 0; 

(b) if there exist a pair of S G Pm and T G Am in D(a;) n D(y) ^ 0 where 
P>{x) G W(P^) and D(y) e W(A„0, then S = T 

then P'm = Pm + Am becomes a discrete polyhedron, and we set 

W{P'J=W{Pm)yjW{Am). 

If Am is adjacent to Pm 7 the addition of Am to Pm is allowed only when 
their adjacent relation is one of (a), (b) and (c) in Table 0 It is not allowed 
when their adjacent relation is either (d) or (e) in Table 0 From Definition 0 we 
see that a discrete polyhedron Pm for to = 6, 18, 26 is constructed by combining 
unit discrete polyhedra AmS in all D(a;) G W(Pm) one by one. 

4.2 Discrete Polyhedron Construction from Volume Data 

Let V be a set of volume data which is a finite subset of Z^. Setting all points 
in V to be 1-points, \ V becomes the set of all 0-points in Z^. Considering a 
discrete polyhedron Pm(V) constructed from any V, we obtain 

Pm(V)= -f Pm(VnD(a;)) (5) 

D(a;)GW 

where 

W = {D(a;) : D(x) n V 0, a; G Z^} . 

Each Pm(V n D(a;)) is a 3-dimensional unit discrete polyhedron which is con- 
structed with respect to 1-points of V in D(a;) G W by referring to Tabled 
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Theorem 1. For any given finite subset V of we can uniquely construct a 
discrete polyhedron Pm(V) for each to = 6, 18, 26 by (^. 

The proof is given in |H| . From theorem Q we have the guarantee of the 
unique construction of Pm(V) for each V in a sequence. In other words, we 
obtain a unique sequence of Pm(V)s corresponding to a sequence of Vs. This 
implies that the uniqueness of Pm(V) with respect to V is important for the 
polyhedral deformation description which is given in the next section. 



5 Deformation of Polyhedral Objects 

5.1 Deformation Description by Set Operations 

Deformation of a discrete polyhedron Pm is mainly classified into two types of 
simple deformation. If Pm(i) and Pm(i + 1) are discrete polyhedra before and 
after deformation, respectively, then two types of deformation from Pm(i) to 
Pm(t + 1) are described using the addition and subtraction operations such that 



Pm(i+ I) — Pm(i) + AP m , 


(6) 


Pm(i+ I) = Pm{t) ~ AP m , 


(7) 



where Z\Pm is a discrete polyhedron which is a difference between Pm(t) and 
Pm(t+1). Equations (El and O are called expanding and shrinking, respectively. 

5.2 Polyhedral Deformation from Sequential Volume Data 

Any deformation of discrete polyhedra P(Vt)s constructed from a sequence of 
volume data VjS is described by a combination of expanding and shrinking 
operations Q and (0). First, we consider two types of simple deformation of 
expanding and shrinking caused by adding a point a; to V( and removing x 
from Vt, respectively. 

By adding a; G \ Vj to Vj, P(V() is expanded to P(V 4 U {a;}) which is 
also uniquely determined by 03), such that 

P(V*U{a;}) = P(Vt)-Z\Pi + /lP 2 (8) 

where 

APi= + P(V*nD(y)), 

D(j/)GEa, 

AP 2 = + P((V*U{a;})nD(y)) 

D(j/)GEa, 



for 



Ea, = {D(y) : x G D(y)} . 



Because the adding point x affects only the eight unit cubes in E^,, we only 
have to see the polyhedral change in Ej, . In OHJ ; first subtract a unit discrete 

polyhedron Z\Pi in E^, from P(V() and then add AP 2 as the replacement of 
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Fig. 4. Examples of expanding Pig(V) by adding a point Xi and then subtract- 
ing a point X 2 ; (a) a deformation description based on m is not obtained at the 
subtraction stage and (b) a deformation description based on (0 is obtained at 
every stage. 



Z\Pi. Figure Elillustrates an example of expanding P 26 (Vt) by adding x to V(. 
The reason that we do not directly apply 

P(Vt U {a;}) = P(Vt) + Z\P (9) 

by setting 

Z\P = AP 2 - APi 

is because such operation collapses the unique decomposition of discrete polyhe- 
dra and may cause a problem of deformation description as shown in Figure 0 
Figure ^illustrates two different descriptions for polyhedral deformation such as 
(a) and (b) which correspond to 0 and 0 respectively. The changes of volume 
data are adding a point xi and then subtracting a point X 2 in this case. The 
figure shows that the description based on 0 may not give a convenient polyhe- 
dral decomposition in a sequence of changing volume data and may not enable 
us to obtain an expected discrete polyhedron at some stage of deformation. To 
avoid such problems, we need to consider decomposition of discrete polyhedra 
based on unit cubic regions Ej, such as 0. 

Similarly, an operation for shrinking P(Vt) to P(Vj \ {a;}) by removing x 
from V( described by 

P{Vt\{x}) = P{Vt)-APi + APs (10) 

where 

Z\Pa= + P((V*\{a:})nD(y)) . 

D(y)GE^ 

Given a sequence of volume data Vi, V 2 , . . ., we set Z\V to be the difference 
between V* and Vj+i. If each of 0 or IIIUII is manipulated for every point in 
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ZiV, consequently we obtain P(V(+i) from P(V(). Since deformation description 
based on (Q and m guarantees the unique polyhedral representation for each 
step of deformation from P(Vt) to P(Vt_|_i), the result of P(Vt_|_i) does not 
depends on the order of choosing points from Z\V. 



6 Conclusions 

In this paper, we studied the algebraic framework for 3-dimensional polyhedral 
deformation in Z^. First, we showed that there exist the finite number of prim- 
itives of discrete polyhedra, which we call unit discrete polyhedra, in Z^. We 
also gave two set operations, addition and subtraction, for the family of sets 
of unit discrete polygons. By using the unit discrete polyhedra and the addi- 
tion operations, we then presented the recursive definition of discrete polyhedra 
and also showed that such discrete polyhedron is uniquely obtained from a set 
of volume data. Finally, we represented deformation of discrete polyhedra by 
using the above two set operations, which we call expanding and shrinking of 
discrete polyhedra, and also showed that a sequence of discrete polyhedra can 
be uniquely obtained by a sequence of volume data. 
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Abstract. In this paper, we propose an inverse quantization method 
for planar binary images. The expansion and superresolution of digital 
binary images involve the same mathematical properties because, for the 
achievement of these processes, we are required to estimate the original 
boundary from digitized images which are expressed as a collection of 
pixels. We first estimate an area through which the original boundary 
curve should pass through. This area is an orthogonal polygon torus 
whose two boundary curves are orthogonal polygons. Second, applying 
curvature flow operation to an orthogonal polygon in this area, we esti- 
mate a smooth boundary. 



1 Introduction 

In this paper, we propose an inverse quantization method for planar binary digi- 
tal images. If a shape is sampled and expressed as a digital shape, it is impossible 
to reconstruct high resolution boundary. Therefore, for the reconstruction of a 
shape in high resolution from digital shapes, we are required to solve an in- 
verse problem which estimates the original boundary. The inverse quantization 
of digital terrain data for the recovery of a smooth terrain surface and series of 
isolevel counters on it are solved using variational problems. This is a surface 
reconstruction method common in computer vision and areal data processing[l, 
2,4]. The expansion and super-resolution of digital binary images are the same 
problem because, for the achievement of these process, we are required to con- 
struct a smooth boundary curve as an estimation of the original boundary from 
digitized images, which are expressed as a collection of pixels. 

Spline curves and surfaces are used for the boundary expression of objects 
on a plane and in a space, respectively. Spline curves and surfaces are described 
as the solution of a variational problem for the fitting of smooth functions to a 
sequence of samples along a curve and an array of sample points on a surface. 
Therefore, splines are utilized for the estimation of the smooth boundary from a 
collection of samples [3-6] . The families of splines are attended by several authors 
in computer vision community for curve fitting [8], corner detection [9], shape 
recovery [2], and detection of discontinuities along the boundary [7]. The spline 
curves are also deeply attended in meteorology for the description of isolevel 
curves on the weather chart [4, 5] . Furthermore, a family of splines is recently 
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studied in the context of wavelet theoretically and shape description practically 
for the application of shape expression for the data transmission through internet 
[10, 11]. These applications partially go back to the application of splines in the 
computer vision for the data compression of boundary information. 

We separate the process of the inverse quantization of digital shapes into 
two steps. Using morphological operation and curvature flow, we construct an 
algorithm for the estimation of a smooth boundary curve form an orthogonal 
polygon which is the boundary of a connected pixels. We first estimate an region 
in which the original boundary curve should pass through. This region is an 
orthogonal polygon torus whose two boundary curves are orthogonal polygons. 
Second, applying deformation operation based on the curvatures of points on a 
curve to an orthogonal polygon in this region, we estimate a smooth boundary 
for the original binary shape. For the deformation of digital polygonal curves, we 
define both local and global curvatures of polygonal curves. For the description 
of smooth boundary in each step of deformation we adopt B-splines. Once a 
smooth boundary is estimated, applying sampling again to the estimated shape, 
we can generate digital images in any resolutions. Therefore, as an application of 
our algorithm, we convert the resolution for binary images. Numerical examples 
show the performance of the proposed method. 



2 Boundary Estimation 

Setting x-y to be an orthogonal coordinate system on the Euclidean plane 
we write a vector on as a — (a, / 3 )^, where is the transpose of vector 
X. Points for which both coordinates are integers are called lattice points on 

and the set of all lattice points is denoted by . The 4- and 8- connected 
neighborhoods of the origin are expressed as A/4 and A/g, respectively. If point 
X is in the 4-and 8- connected region of point y, a pair of points x and y are 4- 
and 8-connected, respectively. 

Pixel Umn is the unit square centered at (m,n)^. Setting f{x,y) to be the 
gray value of a binary image defined on R^, let fmn be the average of binary 
image f{x,y) in Umn- For collection of vectors F = \ fmn = 1}, we 

define three sets, 

A={{x,y)^\f{x,y) = l}, D= |J Umn,B= |J Umn (1) 

{m,n)~^ gF (m,n)~^ GAF 

for AF= {(F0 A/g) \ F} lj{i^\ {FQ A/g)}, where 0 and 0 are the Minkowski 
addition and subtraction, respectively, of two sets in a vector space. We assume 
that the boundary dA of the region A is a continuous smooth simple curve. 
Since we assume 4-connected boundary, the original boundary curve dA of an 
image is contained in the region JB, which is an orthogonal polygon curve of 
finite width. 

Our problem is the reconstruction of boundary dA from a binary digital 
image D. We adopt set B as the first estimation of the original boundary dA. 
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Furthermore, as the second approximation of dA, we adopt the boundary of 
D. The boundary of set D is an orthogonal polygon curve d(s) which lies in 
the finite closed set B. We assume that a set of point {Pa\a=i curve d{s) 
is ordered in the counterclockwise direction and points Pi±i and p^ are four 
connected 

For the third approximation of dA, we construct a B-spline curve p{s) of 
the order three, using ordered points {Pa}a=i along polygonal curve d{s). This 
curve p{s) passes through the area encircled by polygon torus P, which is the 
union of the convex hulls defined by four successive points Pj, p^+i, Pi+ 2 > and 
Pi+ 4 , on dD, where p^ — p^+i for t = 1, 2, • • • , n [12]. This polygonal region P 
is contained in region B. Therefore, an estimation curve p{s) might be closer to 
the original curve than orthogonal polygon curve d(s). 

The vertex angles of an orthogonal polygon are tt/2 and 3tt/2 and the distance 
between each pair of control points is one. These configurations of points on a 
curve yield small smooth vibrations on the B-spline curve whose control points 
lie on an orthogonal polygon. If we deform polygonal boundary curve P using 
discrete curvature flow, 



pjt + ^) - Pait) = F{0a), 



( 2 ) 



where 8a is the discrete curvature of point p^ for the t-th iteration on a polygonal 
curve and F{x) is a function, we can write a minimization criterion as 



E{p{s),Pa{t)) = ^ \p{Sa) 

<A — 1 



PaWP + A / 

Jo 



d'^p{s) 

ds"^ 



2 

dt. 

S—t 



(3) 



3 Deformation of Polygonal Boundary Curve 

The geometric configuration of sets P and B implies that, if we can generate a 
series of polygonal regions P{t) such that 

— P{t + 1) C P{t) for t = 0, 1, 2, • • • , and = P ; 

— limj^oo \P{t) \ = 0, where \P{t)\ is the areal measure of polygonal set P{t), 

using the curvature of each point on a polygonal curve, we define an operation 
which approximately generates this sequence of polygonal curves. 

From the vertices of polygonal curve {Pi}a=iJ we define the vectors and their 
average 

n 

^ k^l 

for fc = 1, 2, • • • , n. Vector expresses global configurations of points for fc > 1, 
since vector is defined from {2k + 1) successive points {PaV^=i-k point 

P^■ 
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The angle between vectors {Pi_^ — Pi) and ~ Pi): and the average of 

these angles, for k = 1, 2, • • • , n, are defined as 






= COS 



{p^-k - P^)^ {p^+k - Pi) \ :n 
\p^-k-p^\■\p^+k-p^\ } ’ * 







( 5 ) 



The angle expresses the local turn of a planar polygonal curve for k = 1. We 
call of the vertex angle of the order k. The average of vertex angles describes 
global turns of a planar polygonal curve. We call </>" the average vertex-angle 
of the order n. If the average vertex-angle </>" is larger than a threshold r, the 
polygonal boundary curve is not locally smooth. Conversely, if the average turn 
angle </>" is smaller than a threshold r, the polygonal boundary curve is locally 
smooth. 

If a curve is not locally smooth at a point, we deform a vertex inward. Fur- 
thermore, if a curve is smooth globally, we deform vertex to enhance the global 
shape in a finite region. Based on these rules, we can describe the equation for 
the deformation of polygon as 



(a{t)ul{t), if(/)”>r, 

a{t)ul(t) — P{t)v2(t), otherwise. 



(6) 



for a pair of monotonically decreasing positive functions a{t) and P(t) such that 
limj^oo C({t) = 0 and limj^oo P{t) = 0. These requirements for the coefficients 
of the recursive form might preserve the condition p^(t -I- 1) = Pi(t) for large t. 
If this equality is satisfied, points on P{t) remain in a finite region along a finite 
polygonal curve. 

For point p”, which is defined as p” = limj^oo Pi(t), setting P = 
the boundary dA of the support of the binary function f{x, y) is estimated as 
the B-spline curve whose control points are elements of P. Here, we set n = 3, 
since the configuration of seven successive points determines the local shape of 
a curve expressed by B-spline polynomials of the order three. 

One of the advantages of B-spline polynomials for the expression of curves 
is that B-spline polynomials approximate the original curve using few control 
points. Therefore, we derive an algorithm for the reduction of the number of 
control points {Pi}"=i to Po = {p°}j^i, where / is a subset of integers from 1 
to n. Setting (/>” to be the average vertex angle of polygonal curve P, for the 
generation of Po from P, we adopt point pj which satisfies one of the following 
conditions. 



— The average vertex angle </>" is larger than a predetermined threshold cf>. 

— If p” is an element of point set Po, then point p ^+3 is an element and points 
p*+i and are not. 

We set n = 3 for the computation of the average vertex angle, since we use 
B-spline polynomials of the order three. 
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4 Generation of High Resolution Images 

The generation of a high-resolution image of f{x, y) is mathematically achieved 
first by computing a binary set, applying sampling scheme to an expanded set of 
the support of f{x, y), and second by reducing the length of the edges of pixels. 
Here, we define the resolution using the length of the pixels. For a set of point F, 
denoting XF— {Aa^la? G F, X > 0}, this process is mathematically described as 
the computation of -^Fm, where Fm is a binary set computed from the binary 
image f{mx, my). We derive an algorithm for the generation of high-generation 
digital images of f{x,y) from set which is obtained using the sample scheme 
with ordinary pixels. 

The generation of a high-resolution digital image of f{x, y) from set F is 
mainly achieved by estimating mdA from F, since the boundary of mF is an 
orthogonal polygon. If an expanded set mdA is estimated, we can construct an 
approximation of set mA from set mdA. Furthermore, set mdA enables us to 
generate an approximation of high-resolution images of /(a;, y) for an arbitrary 
resolution. 

In the previous section, we proposed an algorithm for the estimation of 
boundary curve dA from digital set F. Therefore, using the estimation P, we 
generate set ^Fm according to the following steps. 

1. Compute Pfrom F. 

2. Compute the B-spline curve from mP, and adopt its closure as the estimator 
of mdA. 

3. Apply the sampling scheme to the closure of the curve using unit pixels. 

4. Reduce the size of pixels uniformly. 

5 Numerical Examples 

In figures 1, 2 and 3, we show examples for the generation of high-resolution 
images, from digital binary images using the algorithm derived in the previous 
sections. In each figure (a) shows the binary image obtained from (b) using the 
sampling scheme described in section 3. Figures (c) and (d) show the estimated 
boundary curve and the original boundary curve, respectively. In figure (e), 
the region in which the original boundary exists is shown. Points marked by 
label -I- in figure (f) express lattice points on the boundary of the support of 
the shape shown in figure (a). Points marked by -I- in figure (g) express the 
configurations of control points after deformation of the polygonal curve shown 
in (g) . The curve in figure (h) is the estimated boundary curve after the reduction 
of control points. Points marked by -I- in figure (i) represent control points for the 
expanded images. These figures show that our algorithm accurately generates a 
high-resolution image from a given digital image. 

Here, we set parameters r = tt/3, </> = tt/ 18, and T = 100, where T is the 
maximum number of iterations for the flow computation. Furthermore, we set 
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a (3 no Ue 



Figure 1 
Figure 2 
Figure 3 



2 2 187 41 

100/7 100/7 396 101 
2/5 2/5 332 198 



for a{t) = at~^ and /3{t) = , and no and rie are the numbers of initial control 

points and the reduced control points. In these examples, we could reduce the 
number of sample points. 



6 Conclusions 

In this paper, using a morphological operation and curvature flow, we con- 
structed an algorithm for the estimation of a smooth boundary curve of the 
original image from an orthogonal polygon which is the boundary of connected 
pixels from a given digital images. Our algorithm first estimates an area through 
which the original boundary curve should pass. Second, applying the curvature 
flow operation to an isopolygon, our algorithm estimates a smooth boundary for 
the original binary shape. Using this estimation of the boundary curve, we can 
generate binary digital images for any resolution. Numerical examples conformed 
the performance of the proposed method. 
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Fig. 2. Scale Space Analysis of an Image Sequence. 
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Fig. 3. Resolution Conversion for Geographical Data. 
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